IEM5033 Slides

IEM 5033 Linear Optimization

Baski Balasundaram

Assistant ProfessorIndustrial Engineering & Management

Oklahoma State UniversityStillwater, OK

[email protected]

Course textbook: Introduction to Linear Optimization by Bertsimas and

Tsitsiklis, Athena Scientific (1997).

Chapters1 Introduction and review

PreliminariesBasic linear algebra review

2 The geometry of linear programmingPolyhedra and convex setsExtreme points, vertices, and basic feasible solutionsPolyhedra in standard formDegeneracyExistence of extreme pointsOptimality of extreme pointsRepresentation of bounded polyhedra∗

Projections of polyhedra∗

3 The simplex methodOptimality conditionsDevelopment of the simplex methodImplementations of the simplex methodFinding an initial BFSAnticycling: Lexicographic rule and Bland’s rule

Chapters4 Duality theory

MotivationThe duality theoremsThe dual simplex methodFarkas’ Lemma and linear inequalitiesCones and extreme raysRepresentation of polyhedra

5 Sensitivity analysisLocal sensitivity analysisGlobal dependence on the b vectorThe set of all dual optimal solutionsGlobal dependence on the c vectorParametric programming

6 Large scale optimizationColumn generationThe cutting stock problemCutting plane methodsDantzig-Wolfe decomposition

7 Interior point methodsThe affine scaling algorithm






Resource allocation problem

A manufacturing facility makes n products 1,2, . . . ,n using mdifferent types of row material, 1, . . . ,m.

The cost of one unit of i-th raw material is ri .

We have bi units of raw material i available.

Producing one unit of product j requires aij units of rawmaterial i .

Each unit of product j sells at the price of sj .

Decide the production quantities of each product to maximizethe total profit.


Indices:

i will be used for the i-th raw material, i = 1, . . . ,m;

j will be used for the j-th product, j = 1, . . . ,n.

The decision variables:

xj = the number of units of product j to manufacture,j = 1, . . . ,n.


The objective function (maximize the total profit):

Cost of raw material i needed to manufacture of one unit ofproduct j is given by aij ri .

Therefore the total cost of raw materials required for one unitof product j is

cj =m

∑i=1

aij ri .

Profit per unit of product j is pj = sj − cj = sj −m

∑i=1

aij ri .

Total profit is given byn

∑j=1

pjxj .

Thus, the objective is to maximizen

∑j=1

pjxj .


The resource constraints (do not exceed the availability of rawmaterials):

n

∑j=1

aijxj ≤ bi , i = 1,2, . . . ,m.

Nonnegativity constraints: xj ≥ 0, j = 1,2, . . . ,n.


The problem formulation:

maximizen

∑j=1

pjxj

subject ton

∑j=1

aijxj ≤ bi , i = 1,2, . . . ,m

xj ≥ 0, j = 1,2, . . . ,n

where pj = sj −∑mi=1 aij ri .

A problem of this form is called a linear program (LP).

Linear functions

DefinitionA function f : Rn→ R in the form

f (x) = c1x1 + c2x2 + . . .+ cnxn,

where

cj , j = 1, . . . ,n are constants and

xj , j = 1, . . . ,n are variables,

is called linear.

Key assumptions of linear programming

Proportionality assumption: Contribution of a variable isproportional to its value.

Additivity assumption: Contributions of variables areindependent.

Divisibility assumption: Decision variables can take fractionalvalues.

Certainty assumption: Each parameter is known withcertainty.

Graphical solution to a minimization LPDorian Auto manufactures luxury cars and trucks. The companybelieves that its most likely customers are high-income women andmen. To reach these groups, Dorian Auto has embarked on anambitious TV advertising campaign and has decided to purchase1-minute commercial spots on two type of programs: comedyshows and football games.

Each comedy commercial is seen by 7 million high incomewomen and 2 million high-income men.

Each football game is seen by 2 million high-income womenand 12 million high-income men.

A 1-minute comedy ad costs $50,000 and a 1-minute footballad costs $100,000.

Dorian Auto would like for commercials to be seen by at least 28million high-income women and 24 million high-income men. UseLP to determine how Dorian Auto can meet its advertisingrequirements at minimum cost.

Graphical solution to a minimization LP

Problem Formulation

The decision variables are:

x1 = number of 1-minute comedy ads

x2 = number of 1-minute football ads

LP formulation:

minz = 50x1 + 100x2 (objective function)s.t. 7x1 + 2x2 ≥ 28 (high-income women)

2x1 + 12x2 ≥ 24 (high-income men)x1, x2 ≥ 0 (non-negativity)


The feasible regionfor the problemcontains pointsfor which thevalue of at leastone variable canassume arbitrarilylarge values. It hasan “unbounded”feasible region,but the optimalcost is finite.


Since Dorianwants to minimizetotal advertisingcosts, the optimalsolution to theproblem is thepoint in the feasi-ble region with thesmallest z value.An isocost linewith the smallestz value passesthrough point E.

The optimal solution is at x1 = 3.4,x2 = 1.4. Because at point Eboth the high-income women and high-income men constraints aresatisfied, both constraints are binding.

LP assumptions vs reality (Dorianexample)

Proportionality assumption: We assume that each extracomedy commercial must add exactly 7 million HIW and 2million HIM. This contradicts the empirical evidence that aftera certain point advertising yields diminishing returns.

Additivity assumption: We assume that total ad viewers =comedy ad viewer + football ad viewers. Since many of thesame people might view both ads, double-counting occurs.

Divisibility assumption: We assume that Dorian can purchasefractional number of ad minutes. However, it is possible thatonly 1-minute commercials are available.

Certainty assumption: We assume that each parameter isknown with certainty. However, there is no way of knowingwith certainty of how many viewers are added with each typeof commercial.

Types of LPs

We can classify LPs based on the number of optimal solutions theyhave and the properties of their feasible region:

Some LPs have a unique optimal solution.

Some LPs have an infinite number of optimal solutions(alternative or multiple optimal solutions).

Some LPs have no feasible solutions (infeasible LPs).

Some LPs are unbounded: There are points in the feasibleregion with arbitrarily large (in a maximization problem)z-values.

Alternative optimal solutions

Consider the following LP:

max z = 3x1 + 2x2

s.t. 140 x1 + 1

60 x2 ≤ 1

150 x1 + 1

50 x2 ≤ 1

x1, x2 ≥ 0

Any point (solution) falling on linesegment AE will yield an optimal so-lution with z = 120.

Infeasible LPs


max z = 3x1 + 2x2

s.t. 140 x1 + 1

60 x2 ≤ 1

150 x1 + 1

50 x2 ≤ 1

x1 ≥ 30x2 ≥ 30

x1, x2 ≥ 0

No feasible region exists in this case.

Unbounded LPs


max z = 2x1 − x2

s.t. x1 − x2 ≤ 1

2x1 + x2 ≥ 6

x1, x2 ≥ 0

It is possible to find points in the fea-sible region with arbitrarily large z-values. Thus, this LP is unbounded.

Converting an LP to standard formLP in standard form has only equality and nonnegativityconstraints.

Inequality constraints are converted into equality constraints byintroducing a new variable in the left-hand side.

In j-th ≤ constraint, we add a slack variable sj :

x1 + 2x2 + x3 ≤ 5

↓

x1 + 2x2 + x3 + sj = 5 ⇔ sj = 5−x1−2x2−x3

In j-th ≥ constraint, we subtract a surplus variable ej :

x1 + 2x2 + x3 ≥ 5

↓

x1 + 2x2 + x3− ej = 5 ⇔ ej =−5 + x1 + 2x2 + x3

Example


maximize 5x1 + 5x2 + 3x3

subject to x1 + 3x2 + x3 ≤ 3−x1 + 3x3 ≤ 22x1 − x2 + 2x3 ≤ 42x1 + 3x2 − x3 ≤ 2

x1,x2,x3 ≥ 0.

m

z = 5x1 + 5x2 + 3x3

s1 = 3 − x1 − 3x2 − x3

s2 = 2 + x1 − 3x3

s3 = 4 − 2x1 + x2 − 2x3

s4 = 2 − 2x1 − 3x2 + x3

Here s1,s2,s3,s4 are slack variables.

The initial dictionary

For convenience, we will rename the slack variables as follows:

x4 = s1, x5 = s2, x6 = s3, x7 = s4.

We obtain the following dictionary:

z = 5x1 + 5x2 + 3x3

x4 = 3 − x1 − 3x2 − x3

x5 = 2 + x1 − 3x3

x6 = 4 − 2x1 + x2 − 2x3

x7 = 2 − 2x1 − 3x2 + x3

The initial “feasible dictionary”

z = 5x1 + 5x2 + 3x3

x4 = 3 − x1 − 3x2 − x3

x5 = 2 + x1 − 3x3

x6 = 4 − 2x1 + x2 − 2x3

x7 = 2 − 2x1 − 3x2 + x3

To get a feasible solution, set all variables in the rhs (which we willcall non-basic variables) to 0:

x1 = x2 = x3 = 0 ⇒ x4 = 3,x5 = 2,x6 = 4,x7 = 2;z = 0.

Basic variables: x4,x5,x6,x7 (Basis B = 4,5,6,7) Non-basic variables: x1,x2,x3 (N = 1,2,3).

The corresponding solution is a basic feasible solution (bfs).

Iterative improvement: pivot variable(column)

z = 5x1 + 5x2 + 3x3

x4 = 3 − x1 − 3x2 − x3

x5 = 2 + x1 − 3x3

x6 = 4 − 2x1 + x2 − 2x3

x7 = 2 − 2x1 − 3x2 + x3

To increase the value of z , we can try to increase the value ofone of the non-basic variables with a positive (and as large aspossible) coefficient in the objective.

Thus, we pick a variable with the largest coefficient inzero-row, say x1. We call this variable the pivot variable andthe corresponding column in the table is called the pivotcolumn.

Pivot row

We want to increase the value of x1 while the remainingnonbasic variables remain equal to 0.

We want to preserve nonnegativity:

x4 = 3 − x1 ≥ 0x5 = 2 + x1 ≥ 0x6 = 4 − 2x1 ≥ 0x7 = 2 − 2x1 ≥ 0

For all of these inequalities to be satisfied, we must havex1 ≤ 1. Thus, the largest feasible increase for x1 is equal to 1.

The largest possible increase corresponds to the smallest ratioof the free coefficient to the absolute value of the coefficientfor x1 in the same row, assuming that the coefficient for x1 isnegative.

We say that the row in which the smallest ratio is achievedwins the ratio test. This row is called the pivot row.

Pivot

z = 5x1 + 5x2 + 3x3

x4 = 3 − x1 − 3x2 − x3

x5 = 2 + x1 − 3x3

x6 = 4 − 2x1 + x2 − 2x3

x7 = 2 − 2x1 − 3x2 + x3

We pick the row that wins the ratio test

We express the nonbasic variable in the pivot column throughthe basic variable in the pivot row:

x1 = 1− 3

2x2 +

1

2x3−

1

2x7

Then we substitute this expression for x1 in the remainingrows of the dictionary.

Step 1 feasible dictionary

z = 5 − 52 x2 + 11

2 x3 − 52 x7

x1 = 1 − 32 x2 + 1

2 x3 − 12 x7

x4 = 2 − 32 x2 − 3

2 x3 + 12 x7

x5 = 3 − 32 x2 − 5

2 x3 − 12 x7

x6 = 2 + 4x2 − 3x3 + x7

Basic variables: x1,x4,x5,x6 (B = 1,4,5,6) Non-basic variables: x2,x3,x7 (N = 2,3,7)


z = 263 + 29

6 x2 − 23 x7 − 11

6 x6

x3 = 23 + 4

3 x2 + 13 x7 − 1

3 x6

x1 = 43 − 5

6 x2 − 13 x7 − 1

6 x6

x4 = 1 − 72 x2 + 1

2 x6

x5 = 43 − 29

6 x2 − 43 x7 + 5

6 x6

Basic variables: x3,x1,x4,x5;

Non-basic variables: x2,x7,x6.


z = 10 − 2x7 − x6 − x5

x2 = 829 − 8

29 x7 + 529 x6 − 6

29 x5

x3 = 3029 − 1

29 x7 − 329 x6 − 8

29 x5

x1 = 3229 − 3

29 x7 − 929 x6 + 5

29 x5

x4 = 129 + 28

29 x7 − 329 x6 + 21

29 x5

Optimal solution: x1 = 3229 ,x2 = 8

29 ,x3 = 3029 ;

Optimal objective value: z = 10.

Properties of feasible dictionaries

Every solution of the set of equations comprising a dictionaryis also a solution of the original (step 0) dictionary, and viceversa

Setting the right-hand side variables to zero and evaluatingthe left-hand side variables, we obtain a feasible solution

The Fundamental Theorem of LP

TheoremEvery LP in the standard form has the following three properties:

If it has no optimal solution, then it is either infeasible orunbounded.

If it has a feasible solution, then it has a basic feasiblesolution.

If it has an optimal solution, then it has a basic optimalsolution.

Questions

INITIALIZATION: Will we always be able to start? How to find thestarting feasible dictionary? Does one always exist given that the LPis feasible?

ITERATION: Can we always choose an entering variable, find theleaving variable, and construct the next feasible dictionary bypivoting?

TERMINATION: Is there a possibility that the simplex method willconstruct an endless sequence of solutions without ever reaching anoptimal solution?

CORRECTNESS: When it does terminate with a solution claimedoptimal, can we guarantee that it is indeed optimal? Why do weonly look at BFS? What if the optimal solution is “inside”?How/what does it detect in case of infeasible or unbounded LPs?

EFFICIENCY: Is this algorithm efficient? What is an efficientalgorithm anyway?

Notations!

c =

c1

c2...

cn

x =

x1

x2...

xn

b =

b1

b2...

bm

A =

a11 a12 · · · a1n

a21 a22 · · · a2n...

......

...am1 am2 · · · amn

0 =

00...0

1 =

11...1

A′=

a11 a21 · · · am1

a12 a22 · · · am2...

......

...a1n a2n · · · amn

I =

1 0 · · · 00 1 · · · 0...

......

...0 0 · · · 1

For a square matrix B, its determinant is denoted by det(B) andwhen det(B) 6= 0 the inverse is denoted by B−1.

Notations!

x ′y = y ′x =n

∑i=1

xiyi ‖ x ‖=

√n

∑i=1

x2i =√

x ′x

ai : row i of AAj : column j of Aej : unit vector along dim j (so Aej = Aj)

Ax =n

∑j=1

Ajxj =

a′1xa′2x

...a′mx

Linear programming problem in standardform

(LP):minc ′x

subject to:Ax = b

x ≥ 0

where A ∈ Rm×n,b ∈ Rm,c ∈ Rn.

Linear programming problem in generalform

maxc ′x

subject to:a′ix ≥ bi , i ∈M1,

a′ix ≤ bi , i ∈M2,

a′ix = bi , i ∈M3,

xj ≥ 0, j ∈ N1

xj ≤ 0, j ∈ N2.

This a general LP as long as number of variables and constraintsare finite. We usually don’t worry about the form while formulatingLPs. STD-LP is simply a convenient form to study the simplexmethod. The above can be rewritten as an equivalent problem inthe STD-LP format with easy transformations.

Modeling tricks: free variables

minc ′x

subject to:Ax = b

is equivalent tominc ′y − c ′z

subject toAy −Az = b

y ≥ 0

z ≥ 0

Modeling tricks: min-max problems

minmaxi

c ′i x

subject to:Ax ≤ b

x ≥ 0,

can be equivalently written as,

minz

subject to:z ≥ c ′i x , ∀ i

Ax ≤ b

x ≥ 0,

where z ∈ R is a new variable we have introduced in the reformulation. Aspecial case of this objective is |c ′x | where |.| denotes the absolute value.Note that |c ′x |= maxc ′x ,−c ′x. What if the objective was c ′|x |, where|x | stands for componentwise absolute value?

Modeling tricks: single-ratio linearfractional problems

minc ′x

d ′x

subject to:Ax ≤ b

x ≥ 0,

when d ′x > 0 for every feasible x can be equivalently written as,

minc ′z

subject to:d ′z = 1

Az−bt ≤ 0

z ≥ 0,

where z ∈ Rn, t ∈ R are have been introduced in the reformulation. A

special application of this technique is in Data Envelopment Analysis.

Reading assignment

• Review formulation problems from the handout.

• Review Section 1.5 on basic set theory and linear algebrabackground and notations.

• Review Section 1.6 on Big-O,Ω,Θ notations.

• Review Section 1.8 on a brief history of linear programming.






Some useful properties of matrices

1 (A′)′ = A; (A + B)′ = A′+ B ′; (AB)′ = B ′A′

2 if det(A) 6= 0, then det(A′) 6= 0 and (A′)−1 = (A−1)′

3 (AB)−1 = B−1A−1

4 det(A) = det(A′); det(AB) = det(A)det(B)

5 det(A) 6= 0⇔ A−1 exists ⇔ cols (rows) of A are linearly independent

6 if

A =

[B 0D C

],det(A) = det(B)det(C )

where B and C are square, ⇒ determinant of a triangular matrix isthe product of its diagonal elements

7 if B is obtained by interchanging two rows (columns) of A, thendet(B) =−det(A)

8 if B is obtained from A by adding one row (col) to a scalar multipleof another row, then det(B) = det(A)

9 if B is obtained from A by multiplying a row (col) by a scalar k,then det(B) = k×det(A)

Vector combinations

Suppose we are given vectors v 1,v 2, . . . ,vk ∈ Rn.

1 ∑ki=1 λiv

i is called a linear combination of the given vectorsfor λi ∈ R ∀i .

2 ∑ki=1 λiv

i is called an affine combination of the given vectorsfor λi ∈ R ∀i if in addition ∑

ki=1 λi = 1.

3 ∑ki=1 λiv

i is called a conic combination of the given vectors forλi ∈ R+ ∀i , that is λi ≥ 0.

4 ∑ki=1 λiv

i is called a convex combination of the given vectorsfor λi ≥ 0 ∀i if in addition ∑

ki=1 λi = 1.

Illustration of various vector combinations

Linear independence, subspaces and bases

DefinitionVectors v 1,v 2, . . . ,vk ∈ Rn are said to be linearly independent ifthe unique solution to system of equations ∑

ki=1 λiv

i = 0 isλi = 0 ∀i . Otherwise, they are said to be linearly dependent.

DefinitionA nonempty subset S of Rn is called a subspace if αx + β y ∈ S forall x ,y ∈ S and for all α,β ∈ R. If a subspace S 6= Rn, then it iscalled a proper subspace.

DefinitionThe span of v 1,v 2, . . . ,vk ∈ Rn is the subspace of Rn defined bythe collection of all linear combinations of v 1,v 2, . . . ,vk ∈ Rn.

DefinitionGiven a subspace S of Rn, with S 6= 0, a basis of S is acollection of linearly independent vectors that span S. Every basisof S has the same number of vectors, and this number is called thedimension of S.



ki=1 λiv







ki=1 λiv







ki=1 λiv





Subspaces

• 0 is a 0-dimensional subspace of Rn; lines through theorigin are 1-dimensional subspaces of Rn; planes through theorigin are 2-dimensional subspaces of Rn; Rn is an-dimensional subspace

• Every proper subspace of Rn has dimension smaller than n

• If S is a proper subspace of Rn, then there exists a nonzerovector a orthogonal to S , that is a′x = 0 for every x ∈ S

• If S is a proper subspace of Rn, thenS⊥ = y ∈ Rn : x ′y = 0 ∀ x ∈ S is called its orthogonalcomplement; S⊥ is also a subspace

• If dim(S) = m < n, then there exists n−m linearlyindependent vectors orthogonal to S

• If the span S of vectors x1, . . . ,xK has dimension m, thereexists a basis of S consisting of m of the vectors x1, . . . ,xK

Subspaces

TheoremSuppose the span S of x1, . . . ,xK has dimension m. If k ≤m andx1, . . . ,xk are linearly independent, we can form a basis of S bystarting with x1, . . . ,xk , and choosing m−k of the vectorsxk+1, . . . ,xK .

Proof.If every vector in xk+1, . . . ,xK can be expressed as a linearcombination of x1, . . . ,xk , then every vector in S is also a linearcombination of x1, . . . ,xk , and hence they form the basis withm = k . Otherwise, at least one of the vectors xk+1, . . . ,xK islinearly independent from x1, . . . ,xk resulting in a collection ofk + 1 linearly independent vectors. Repeating this process m−ktimes results in the desired basis.

Subspaces

TheoremSuppose the span S of x1, . . . ,xK has dimension m. If k ≤m andx1, . . . ,xk are linearly independent, we can form a basis of S bystarting with x1, . . . ,xk , and choosing m−k of the vectorsxk+1, . . . ,xK .

Proof.If every vector in xk+1, . . . ,xK can be expressed as a linearcombination of x1, . . . ,xk , then every vector in S is also a linearcombination of x1, . . . ,xk , and hence they form the basis withm = k . Otherwise, at least one of the vectors xk+1, . . . ,xK islinearly independent from x1, . . . ,xk resulting in a collection ofk + 1 linearly independent vectors. Repeating this process m−ktimes results in the desired basis.

Subspaces associated with a matrixLet A be an m×n matrix.• The column space of A is the subspace of Rm spanned by the

columns of A• The row space of A is the subspace of Rn spanned by the

rows of A• dimension of the column space is the number of linearly

independent columns of A• dimension of the row space is the number of linearly

independent rows of A• dimension of the column space and the row space are always

equal, and this number is called the rank of A• rank(A)≤min(m,n); if rank(A) = min(m,n) it is said to be

of full rank• x : Ax = 0 is called the null space of A; it is a subspace of

Rn and its dimension is n− rank(A)• Every subspace S of Rn has a representation

S = x ∈ Rn : Ax = 0 and dim(S) = n− rank(A)

Systems of linear equations

DefinitionA system of linear equations Ax = b,

1 has no solution if rank(A,b) > rank(A),

2 has a unique solution if rank(A,b) = rank(A) = n,

3 has infinitely many solutions if rank(A,b) = rank(A) < n,

where rank(A,b) is the rank of A augmented with an additionalcolumn b.

Affine subspaces

DefinitionA subset S of Rn is called an affine subspace if αx + (1−α)y ∈ Sfor all x ,y ∈ S and for all α ∈ R.

• Subspaces of Rn are precisely affine subspaces containing theorigin

• If S0 is a subspace of Rn and x0 is some vector, the translationof S0, S = S0 + x0 = x + x0 : x ∈ S0 is an affine subspace

• Every nonempty affine subspace S is a translation of a uniquesubspace S0

• Dimension of an affine subspace S is defined to be equal tothe dimension of the underlying subspace S0

• Given b ∈ Rm,A ∈ Rm×n, S = x ∈ Rn : Ax = b is an affinesubspace of Rn. Moreover, every affine subspace has such arepresentation

Affine subspaces

DefinitionA subset S of Rn is called an affine subspace if αx + (1−α)y ∈ Sfor all x ,y ∈ S and for all α ∈ R.

• Subspaces of Rn are precisely affine subspaces containing theorigin

• If S0 is a subspace of Rn and x0 is some vector, the translationof S0, S = S0 + x0 = x + x0 : x ∈ S0 is an affine subspace

• Every nonempty affine subspace S is a translation of a uniquesubspace S0

• Dimension of an affine subspace S is defined to be equal tothe dimension of the underlying subspace S0

• Given b ∈ Rm,A ∈ Rm×n, S = x ∈ Rn : Ax = b is an affinesubspace of Rn. Moreover, every affine subspace has such arepresentation






Hyperplanes, halfspaces, polyhedra

DefinitionThe set x ∈ Rn : a′x = b where a ∈ Rn,a 6= 0,b ∈ R is called ahyperplane.

DefinitionThe set x ∈ Rn : a′x ≥ b where a ∈ Rn,a 6= 0,b ∈ R is called ahalfspace.

DefinitionA polyhedron is the intersection of a finite number of halfspaces,represented by x ∈ Rn : Ax ≥ b where A ∈ Rm×n,b ∈ Rm. Abounded polyhedron is called a polytope.









Convex sets and convex hull

DefinitionA set S is convex if ∀x ,y ∈ S, λ x + (1−λ )y ∈ S for any λ ∈ [0,1].

RemarkIn other words, a set S ⊆ Rn is said to be convex if the linesegment joining any two points in the set, is contained in the set.

DefinitionLet S be an arbitrary set in Rn. The convex hull of S, denoted byconv(S), is the intersection of all convex sets containing S.Equivalently, conv(S) is the minimal convex set containing S.

Convex sets and convex hull

DefinitionA set S is convex if ∀x ,y ∈ S, λ x + (1−λ )y ∈ S for any λ ∈ [0,1].

RemarkIn other words, a set S ⊆ Rn is said to be convex if the linesegment joining any two points in the set, is contained in the set.

DefinitionLet S be an arbitrary set in Rn. The convex hull of S, denoted byconv(S), is the intersection of all convex sets containing S.Equivalently, conv(S) is the minimal convex set containing S.

Convex hull of finite points

DefinitionFor S = x1, . . . ,xk, x ∈ conv(S) if and only if there existλ1, . . . ,λk such that

x =k

∑j=1

λjxj (1)

k

∑j=1

λj = 1 (2)

λj ≥ 0 ∀j = 1, . . . ,k . (3)

RemarkIn other words, convex hull of x1, . . . ,xk is the collection of allconvex combinations of these vectors.

Convex sets- some properties

Theorem

1 Intersection of convex sets is convex.

2 Every polyhedron is a convex set.

3 A convex combination of a finite number of elements of aconvex set also belongs to that set.

4 The convex hull of a finite number of vectors is a convex set.

Convex Functions

Definition (3.1.1)

Let f : S −→ R, where S is a nonempty convex set in Rn. Thefunction f is said to be convex on S if

f (λ x1 + (1−λ )x2)≤ λ f (x1) + (1−λ )f (x2)

for each x1,x2 ∈ S and for each λ ∈ [0,1].The function is said to be strictly convex on S if the aboveinequality is true as a strict inequality for each distinct x1 and x2,and for each λ ∈ (0,1).The function f is said to be concave (strictly concave) on S if −fis convex (strictly convex) on S.

Convex Functions - Useful Facts

1 Let f1, f2, . . . , fk : Rn −→ R be convex functions. Then:

1.1 f (x) = ∑kj=1 αj fj(x), where αj > 0 for j = 1,2, . . . ,k is a

convex function.

1.2 f (x) = maxf1(x), f2(x), . . . , fk(x) is a convex function.

2 Suppose that g : Rn −→ R is a concave function. LetS = x : g(x) > 0, define f : S −→ R as f (x) = 1

g(x) . Then fis convex over S .

3 Let g : Rm −→ R be a convex function, and let h : Rn −→ Rm

be an affine function of the form h(x) = Ax + b, where A is anm×n matrix and b is a m vector. Then the compositefunction f : Rn −→ R defined as f (x) = g(h(x)) is a convexfunction.

4 If f : Rn −→ R is both convex and concave, then f is affine,f (x) = g ′x + c .

Level Sets of Convex Functions

Let f : S −→ R be a convex function over S .

DefinitionThe set Sα = x ∈ S : f (x)≤ α, α ∈ R is called the lower-levelset. Henceforth, we refer to this set simply as the level set.

RemarkLet S be a nonempty convex set in Rn, and let f : S −→ R be aconvex function. The level set Sα = x ∈ S : f (x)≤ α, α ∈ R, isa convex set.

Epigraph and Hypograph of a function

Let S be a nonempty set in Rn, and let f : S −→ R.

DefinitionThe epigraph of f , denoted by epi(f ), is a subset of Rn+1 definedby

(x ,y) : x ∈ S ,y ∈ R,y ≥ f (x).

DefinitionThe hypograph of f , denoted by hypo(f ), is a subset of Rn+1

defined by(x ,y) : x ∈ S ,y ∈ R,y ≤ f (x).

TheoremLet S be a nonempty convex set in Rn, and let f : S −→ R. Thenf is convex if and only if epi(f ) is a convex set.






Extreme point, vertex of a polyhedron

DefinitionLet P be a polyhedron. A vector x ∈ P is an extreme point of P ifwe cannot find y ,z ∈ P, both distinct from x, and scalar λ ∈ [0,1],such that x = λ y + (1−λ )z.

DefinitionLet P be a polyhedron. A vector x ∈ P is a vertex of P if thereexists some c such that c ′x < c ′y for all y ∈ P \x.

Extreme point, vertex of a polyhedron

DefinitionLet P be a polyhedron. A vector x ∈ P is an extreme point of P ifwe cannot find y ,z ∈ P, both distinct from x, and scalar λ ∈ [0,1],such that x = λ y + (1−λ )z.

DefinitionLet P be a polyhedron. A vector x ∈ P is a vertex of P if thereexists some c such that c ′x < c ′y for all y ∈ P \x.

Active constraints

Consider a polyhedron P ⊂ Rn defined in terms of linear equalityand inequality constraints

a′ix ≥ bi , i ∈M1,

a′ix ≤ bi , i ∈M2,

a′ix = bi , i ∈M3,

where M1,M2,M3 are finite index sets.

DefinitionIf a vector x∗ satisfies a′ix

∗ = bi for some i ∈M1∪M2∪M3, we saythat the corresponding constraint is active or binding at x∗.

Basic feasible solutionsTheoremLet x∗ ∈ Rn and let I = i : a′ix

∗ = bi be the index set of activeconstraints. Then the following are equivalent:

1 There exist n vectors in the set ai : i ∈ I, which are linearlyindependent.

2 The span of the vectors ai , i ∈ I , is all of Rn.

3 The system of equations a′ix = bi , i ∈ I has a unique solution.

DefinitionConsider a polyhedron P defined by linear equality and inequalityconstraints, and let x∗ ∈ Rn.

1 The vector x∗ is a basic solution if:(i) All equality constraints are active;(ii) There n linearly independent constraints out of all the activeconstraints at x∗.

2 If x∗ is a basic solution that satisfies all the constraints, it is called abasic feasible solution.

Basic feasible solutionsTheoremLet x∗ ∈ Rn and let I = i : a′ix

∗ = bi be the index set of activeconstraints. Then the following are equivalent:

1 There exist n vectors in the set ai : i ∈ I, which are linearlyindependent.

2 The span of the vectors ai , i ∈ I , is all of Rn.

3 The system of equations a′ix = bi , i ∈ I has a unique solution.

DefinitionConsider a polyhedron P defined by linear equality and inequalityconstraints, and let x∗ ∈ Rn.

1 The vector x∗ is a basic solution if:(i) All equality constraints are active;(ii) There n linearly independent constraints out of all the activeconstraints at x∗.

2 If x∗ is a basic solution that satisfies all the constraints, it is called abasic feasible solution.

TheoremLet P = x : a′ix ≥ bi , i ∈M;a′ix = bi , i ∈M ′ be a nonemptypolyhedron and let x∗ ∈ P. Then, the following are equivalent:

1 x∗ is a vertex;

2 x∗ is an extreme point;

3 x∗ is a basic feasible solution.

Proof.Vertex ⇒ Extreme point:Suppose x∗ ∈ P is a vertex. There ∃c ∈ Rn such thatc ′x∗ < c ′y ∀ y ∈ P \x∗. If y ,z ∈ P distinct from x∗, thenc ′(λ y + (1−λ )z) = λ c ′y + (1−λ )c ′z > c ′x∗ for any λ ∈ [0,1].

TheoremLet P = x : a′ix ≥ bi , i ∈M;a′ix = bi , i ∈M ′ be a nonemptypolyhedron and let x∗ ∈ P. Then, the following are equivalent:

1 x∗ is a vertex;

2 x∗ is an extreme point;

3 x∗ is a basic feasible solution.

Proof.Vertex ⇒ Extreme point:Suppose x∗ ∈ P is a vertex. There ∃c ∈ Rn such thatc ′x∗ < c ′y ∀ y ∈ P \x∗. If y ,z ∈ P distinct from x∗, thenc ′(λ y + (1−λ )z) = λ c ′y + (1−λ )c ′z > c ′x∗ for any λ ∈ [0,1].

Proof.Extreme point ⇒ BFS:Suppose x∗ ∈ P is not a BFS. Then, there do not exist n linearlyindependent vectors in the collection ai , i ∈ I(x∗). Let d 6= 0 be avector in the orthogonal complement of the span of ai , i ∈ I(x∗).There exists an ε > 0 such that y = x∗+ εd , z = x∗− εd andy ,z ∈ P.BFS ⇒ vertex:Let x∗ be a BFS and let c = ∑i∈I(x∗) ai . Thenc ′x∗ = ∑i∈I(x∗) a′ix

∗ = ∑i∈I(x∗) bi . For any x ∈ P and any i , we havea′ix ≥ bi , and c ′x = ∑i∈I(x∗) a′ix ≥ ∑i∈I(x∗) bi . So x∗ is an optimalsolution to mincx : x ∈ P. Furthermore, equality holds iff a′ix = bi

for all i ∈ I(x∗). Since x∗ is a BFS, there are n linearly independentconstraints active at x∗, and hence x∗ is the unique solution to thesystem a′ix = bi , i ∈ I(x∗).

Corollary

Given a finite number of linear inequality constraints, there canonly be a finite number of basic or basic feasible solutions.




Corollary





Corollary







Polyhedra in standard form

P = x ∈ Rn : Ax = b,x ≥ 0,

be a polyhedron in standard form, and let the dimensions of A bem×n.

TheoremConsider the constraints Ax = b and x ≥ 0. Assume A has full rowrank. Then x ∈ Rn is a basic solution if and only if we haveAx = b, and there exists indices B(1), . . . ,B(m) such that:

1 The columns AB(1), . . . ,AB(m) are linearly independent;

2 If i 6= B(1), . . . ,B(m), then xi = 0.

Proof.Sufficiency:Let x ∈ Rn and ∃B(1), . . . ,B(m) satisfying (1) and (2). The activeconstraints xi = 0, i 6= B(1), . . . ,B(m) and Ax = b imply,

Ax =m

∑i=1

AB(i)xB(i) = b,

has a unique solution since AB(i) are linearly independent. SinceI(x) contains all linear equalities and has a unique solution, itcontains n linearly independent constraints. Hence, x is a basicsolution.

Proof.Necessity:Suppose x is a basic solution, we show that (1) and (2) aresatisfied. Let xB(1), . . . ,xB(k) be the nonzero components of x .Then, Ax = b and xi = 0, i 6= B(1), . . . ,B(k) has a unique solutionx∗, since it is basic. That is, ∑

ki=1 AB(i)xB(i) = b, has a unique

solution. It follows that the columns AB(1), . . . ,AB(k) are linearly

independent, hence k ≤m. [If not, ∃0 6= λ ∈ Rk such that

∑ki=1 AB(i)λi = 0, implying an alternate solution

xB(i) + λi , i = 1, . . . ,k , rest 0.] Since rank(A) = m, we can findm−k additional columns AB(k+1), . . . ,AB(m) so that AB(i) fori = 1, . . . ,m is linearly independent. This proves (1). Ifi 6= B(1), . . . ,B(m), then xi = 0 as every nonzero component isincluded in B(1), . . . ,B(k). This proves (2).



















Constructing basic solutions

Given P = x ∈ Rn : Ax = b,x ≥ 0 with rank(A) = m, basicsolutions can be constructed as follows.

1 Choose m linearly independent columns AB(1), . . . ,AB(m);

2 Let xi = 0 for all i 6= B(1), . . . ,B(m);

3 Solve ∑mi=1 AB(i)xB(i) = b.

RemarkIf the basic solution x is non-negative, then x ∈ P and it is a basicfeasible solution. B(1), . . . ,B(m) are called basic indices.xB = [xB(1), . . . ,xB(m)]′ are referred to as basic variables; the restare called nonbasic variables. The columns AB(1), . . . ,AB(m) arecalled basic columns; since they are linearly independent, they forma basis of Rm. B = [AB(1) AB(2) · · ·AB(m)] is called the basismatrix.

b

A1

A2A4 = -A1

A3

Standard form P with 2 x 4 A

Enumerate the bases and identify which ones yield basic feasible solutions.

Bases, basic solutions, and adjacency

• Different basic solutions must correspond to different bases

• Different bases, B(1), . . . ,B(m) 6= B(1), . . . , B(m) couldcorrespond to the same basic solution

• Two distinct basic solutions are adjacent if there are n−1linearly independent constraints that are active at both points

• For standard form problems, two bases are adjacent if theyshare all but one basic column

• Adjacent basic solutions can be obtained from two adjacentbases

• If two adjacent bases lead to distinct basic solutions, thenthey are also adjacent

Full row rank assumption

TheoremLet P = x : Ax = b,x ≥ 0 be a nonempty polyhedron, where A ism×n, with rows a′1, . . . ,a′m. Suppose that rank(A) = k < m andthat the rows a′i1 , . . . ,a′ik are linearly independent. Consider thepolyhedron

Q = x : a′i1x = bi1 , . . . ,a′ik x = bik ,x ≥ 0.

Then Q = P.

Proof.W.l.o.g assume i1 = 1, . . . , ik = k. Clearly, P ⊆ Q. If we show thatQ ⊆ P, we are done. That is to show that any arbitrary y ∈ Q isalso in P.Since P 6= /0, rank(A) = rank([A,b]) = k . Then,(a′1,b1), . . . ,(ak ,bk) span the row space of [A,b] and hence any(a′i ,bi ) can be expressed as a linear combination of(a′1,b1), . . . ,(a′k ,bk). That is, ∃λij , j = 1, . . . ,k such that(a′i ,bi ) = ∑

kj=1 λij(a′j ,bj) for every row (a′i ,bi ).

Now consider y ∈ Q. Then for any row a′i of A,

a′iy =k

∑j=1

λija′jy =

k

∑j=1

λijbj = bi .

Hence, y ∈ P ⇒ Q ⊆ P.




a′iy =k

∑j=1

λija′jy =

k

∑j=1

λijbj = bi .





a′iy =k

∑j=1

λija′jy =

k

∑j=1

λijbj = bi .


Dimension of P

Given P = x ∈ Rn : Ax = b,x ≥ 0 6= /0, the dimension of P isgiven by:

dim(P) = n− rank(A=),

where A= is the system of implied equalities. Hence, A= containsA as a submatrix and any of the nonnegativity constraints thatmight hold as equalities at every point in P.

A= =

Ae ′i1...

e ′ik

where xij = 0 ∀ x ∈ P for j = 1, . . . ,k .






Degeneracy

DefinitionA basic solution x ∈ Rn is said to be degenerate if more than nconstraints are active at x.

DefinitionConsider the standard form polyhedronP = x ∈ Rn : Ax = b,x ≥ 0 and let x be a basic solution. Let mbe the number of rows of A. Then x is a degenerate basic solutionif more than n−m components of x are zero.

Degeneracy

DefinitionA basic solution x ∈ Rn is said to be degenerate if more than nconstraints are active at x.

DefinitionConsider the standard form polyhedronP = x ∈ Rn : Ax = b,x ≥ 0 and let x be a basic solution. Let mbe the number of rows of A. Then x is a degenerate basic solutionif more than n−m components of x are zero.

Degeneracy

Which of these basic solutions are degenerate?

Degeneracy in standard P

A

B

x3 = 0

x4 = 0

x5= 0

x6= 0

x1= 0

x2= 0

An n−m dimensional illustration of degeneracy, wheren = 6,m = 4. A is a nondegenerate BFS with n−m = 2 variablesat zero (x4 = x5 = 0); B is a degenerate BFS with more than n−mvariables at zero (x1 = x5 = x6 = 0).

Degeneracy is not purely geometric

x3

x1

x2(0,0,1)

(1,1,0)

In other words, it is representation dependent. Consider standard form

P = x ∈ R3 : x1−x2 = 0,x1 + x2 + 2x3 = 2,x ≥ 0 with

n = 3,m = 2,n−m = 1. (1,1,0) is nondegenerate because only one

variable is at zero; (0,0,1) is degenerate because two variables are at

zero. But in the nonstandard form,

P = x ∈ R3 : x1−x2 = 0,x1 + x2 + 2x3 = 2,x1 ≥ 0,x3 ≥ 0, (0,0,1) is

nondegenerate as only n = 3 constraints are active at that point.






Existence of extreme points

DefinitionA polyhedron P ⊂ Rn contains a line if there exists a vectors x ∈ Pand a nonzero vector d ∈Rn such that x + λ d ∈ P for all scalars λ .

TheoremSuppose that the polyhedron P = x ∈ Rn : a′ix ≥ bi , i = 1, . . . ,mis nonempty. Then the following are equivalent:

1 The polyhedron P has at least one extreme point.

2 The polyhedron P does not contain a line.

3 There exist n vectors out of the family a1, . . . ,am, which arelinearly independent.

Existence of extreme points

DefinitionA polyhedron P ⊂ Rn contains a line if there exists a vectors x ∈ Pand a nonzero vector d ∈Rn such that x + λ d ∈ P for all scalars λ .

TheoremSuppose that the polyhedron P = x ∈ Rn : a′ix ≥ bi , i = 1, . . . ,mis nonempty. Then the following are equivalent:

1 The polyhedron P has at least one extreme point.

2 The polyhedron P does not contain a line.


Proof.no line ⇒ extreme pointSuppose P does not contain a line. Let x ∈ P and let I(x) be theactive set. If I(x) contains n linearly independent constraints, thenx is a BFS. If I(x) contains fewer, the span of ai , i ∈ I(x) is aproper subspace of Rn and there exists a nonzero d such thata′id = 0, i ∈ I(x). Consider the line y = x + λ d ,λ ∈ R.a′iy = bi , i ∈ I(x),λ ∈ R. Hence the constraints in I(x) remainactive at y for all λ ∈ R. Since P does not contain a line, as λ

varies, some constraint must be violated. Hence ∃λ ∗, j /∈ I(x) suchthat a′j(x + λ ∗d) = bj . We claim, aj is linearly independent ofai , i ∈ I(x). a′j(x + λ ∗d) = bj ,a′jx 6= bj ⇒ a′jd 6= 0. Since d isorthogonal to the span of ai , i ∈ I(x), aj must be linearlyindependent of the active constraints at x . Now, I(x + λ ∗d)⊃ I(x)and it contains at least one more linearly independent constraint.Repeating this process starting from x + λ ∗d , we will eventuallyfind a point y∗ ∈ P such that I(y∗) contains n linearly independentconstraints.









Proof.extreme point ⇒ n linearly independent constraints out ofa1, . . . ,am

trivialn LI among a1, . . . ,am⇒ no line W.l.o.g assume a1, . . . ,an arelinearly independent. Suppose P contains a line x + λ d wherex ∈ P and d 6= 0. Then a′i (x + λ d)≥ bi , i ∈ 1, . . . ,m and λ ∈ R.Hence a′id = 0 for all i = 1, . . . ,m. Since a1, . . . ,an are linearlyindependent, we have d = 0 contradiction our assumption of anonzero d . Hence P must not contain a line.

Corollary

A nonempty polytope must contain an extreme point. A nonemptystandard form polyhedron contains an extreme point.

Proof.extreme point ⇒ n linearly independent constraints out ofa1, . . . ,am

trivialn LI among a1, . . . ,am⇒ no line W.l.o.g assume a1, . . . ,an arelinearly independent. Suppose P contains a line x + λ d wherex ∈ P and d 6= 0. Then a′i (x + λ d)≥ bi , i ∈ 1, . . . ,m and λ ∈ R.Hence a′id = 0 for all i = 1, . . . ,m. Since a1, . . . ,an are linearlyindependent, we have d = 0 contradiction our assumption of anonzero d . Hence P must not contain a line.

Corollary

A nonempty polytope must contain an extreme point. A nonemptystandard form polyhedron contains an extreme point.






Optimality of extreme points

TheoremConsider minc ′x : x ∈ P. Suppose that P = x : Ax ≥ b has atleast one extreme point and there exists an optimal solution. Then,there exists an optimal solution which is an extreme point of P.

TheoremConsider minc ′x : x ∈ P. Suppose P has at least one extremepoint. Then, either the optimal cost is equal to −∞, or there existsan extreme point which is optimal.

Corollary

The LP minc ′x : x ∈ P is either infeasible, unbounded, or has anoptimal solution.




Corollary





Corollary



Proof.Let Q 6= /0 be the set of optimal solution and v the optimal cost.Then Q = x : Ax ≥ b,c ′x = v is also a polyhedron. Q ⊂ P and Pcontains no lines, so Q must have an extreme point, say x∗.Suppose x∗ is not an extreme point of P. Then ∃y ,z ∈ P \x∗such that x∗ = λ y + (1−λ )z for some λ ∈ [0,1].c ′x∗ = λ c ′y + (1−λ )c ′z = v and c ′z ≥ v ,c ′y ≥ v ⇒ c ′z = c ′y = vHence, y ,z ∈ Q contradicting the fact that x∗ is an extreme pointof Q.


Proof.Let Q 6= /0 be the set of optimal solution and v the optimal cost.Then Q = x : Ax ≥ b,c ′x = v is also a polyhedron. Q ⊂ P and Pcontains no lines, so Q must have an extreme point, say x∗.Suppose x∗ is not an extreme point of P. Then ∃y ,z ∈ P \x∗such that x∗ = λ y + (1−λ )z for some λ ∈ [0,1].c ′x∗ = λ c ′y + (1−λ )c ′z = v and c ′z ≥ v ,c ′y ≥ v ⇒ c ′z = c ′y = vHence, y ,z ∈ Q contradicting the fact that x∗ is an extreme pointof Q.

TheoremConsider minc ′x : x ∈ P. Suppose P has at least one extreme point.Then, either the optimal cost is equal to −∞, or there exists an extremepoint which is optimal.

Proof.Define (for this proof), “rank” of a point x ∈ P as the maximum numberof linearly independent constraints in I(x). Suppose the the optimumcost is finite. Let x ∈ P = x : Ax ≥ b of rank k < n. We demonstrate ay ∈ P such that c ′y ≤ c ′x which has greater rank. Since rank of x isk < n, there exists a d 6= 0 such that a′id = o, i ∈ I(x) and c ′d ≤ 0.Suppose c ′d < 0. Consider the ray y = x + λ d ,λ > 0 that satisfies allconstraints in I(x). There is some λ ∗ such that for some j /∈ I(x),a′j(x + λ ∗d) = bj . Now c ′(x + λ ∗d) < c ′x and rank of I(x + λ ∗d) is atleast k + 1.Suppose c ′d = 0. Repeat the same argument to obtain another pointx + λ ∗d such that c ′(x + λ ∗d) = c ′x and rank of x + λ ∗d is at least k + 1.In either case, given any x ∈ P of rank smaller than n, we can findanother point in P of higher rank without worsening the objective value.We can repeat this until we find a point of rank n, i.e. a BFS. If w is anoptimal solution, then there exists an optimal BFS w∗ such thatc ′w∗ = c ′w .












Representation of bounded polyhedra

TheoremA nonempty, bounded polyhedron is the convex hull of its extremepoints.

RemarkA similar result exists for unbounded polyhedra too.

Representation of bounded polyhedra

TheoremA nonempty, bounded polyhedron is the convex hull of its extremepoints.

RemarkA similar result exists for unbounded polyhedra too.






Projections of polyhedra

DefinitionLet x ∈ Rn and k ≤ n, the projection mapping πk : Rn −→ Rk

projects x onto its first k coordinates:

πk(x) = πk(x1, . . . ,xn) = (x1, . . . ,xk).

Projection of a set S ⊂ Rn is defined as,

Πk(S) = πk(x) : x ∈ S.

Alternately,

Πk(S) = (x1, . . . ,xk) : ∃xk+1, . . . ,xns.t.(x1, . . . ,xn) ∈ S.

TheoremLet P ⊆ Rn+k be a polyhedron. Then, the set

x ∈ Rn : ∃y ∈ Rns.t(x ,y) ∈ Rn+k

is also a polyhedron.

TheoremLet P ⊆ Rn be a polyhedron and let A be an m×n matrix. Then,the set

Q = Ax ∈ Rn : x ∈ P


TheoremThe convex hull of a finite number of vectors is a polyhedron.





Q = Ax ∈ Rn : x ∈ P







Q = Ax ∈ Rn : x ∈ P








The simplex method- (over)simplified• minc ′x : Ax = b,x ≥ 0 – the standard LP, and the standard

P everywhere in this chapter.• We know that if a LP in standard form has an optimal

solution, then there exists a BFS that is optimal.• The simplex method starts at a BFS you provide, and moves

(“along an edge”) from one BFS to an adjacent BFS whichdecreases the objective.

• When a BFS is reached from which no improving move ispossible, the method terminates.

? Please review the simplex method you have studied in yourintro OR course, and solve the following problem:

min −5x1 −4x2 −3x3

s.t. 2x1 +3x2 +x3 +x4 = 54x1 +x2 +2x3 +x5 = 113x1 +4x2 +2x3 +x6 = 8

xi ≥ 0, ∀ i = 1, . . . ,6

Feasible directions

DefinitionLet x ∈ P and d 6= 0. Then d is said to be a feasible direction at x, ifthere exists a positive scalar θ for which x + θd ∈ P.

• Let x be a BFS, let B(1), . . . ,B(m) be the basic variable indices,and B = [AB(1) · · ·AB(m)] be the corresponding basis matrix.

• We wish to move from x to a new BFS x + θd , so that a nonbasicxj is increased from 0 to θ while the other nonbasic variables arekept at 0.

• In such a d , di must be 0 for every nonbasic i and dj = 1; the basicvariable xB changes to xB + θdB .

A(x +θd) = b⇒Ad = 0 =m

∑i=1

AB(i)dB(i) +Aj = BdB +Aj⇒ dB =−B−1Aj

Feasible directions





A(x +θd) = b⇒Ad = 0 =m

∑i=1


Feasible directions





A(x +θd) = b⇒Ad = 0 =m

∑i=1


Feasible directions





A(x +θd) = b⇒Ad = 0 =m

∑i=1


Feasible directions





A(x +θd) = b⇒Ad = 0 =m

∑i=1


Feasible directions





A(x +θd) = b⇒Ad = 0 =m

∑i=1


Feasible directions





A(x +θd) = b⇒Ad = 0 =m

∑i=1


Basic directions

• The direction d so constructed (dB =−B−1Aj ;dj = 1;di = 0otherwise) is called the j th basic direction; this guaranteesthat the equality constraints are met as we move away from x .

• In order to maintain non-negativity of x + θd , it is sufficientto maintain the non-negativity of xB + θdB ; consider thefollowing two cases:

1 x is a nondegenerate BFS: Then xB > 0, and for a sufficientlysmall θ we can ensure xB + θdB ≥ 0. Such a basic direction dwould be a feasible direction.

2 x is a degenerate BFS: Some xB(i) = 0 and if dB(i) < 0, thenfor any positive θ the direction d is infeasible.

Basic directions





Basic directions





Basic directions





Basic directions





Reduced costs

• If d is the j th basic direction, then c ′d is the rate of change ofcost along d given by

c ′d = c ′BdB + cj =−c ′BB−1Aj + cj .

• cj is the cost per unit increase in xj , while −cBB−1Aj is thecost compensating the change in the basic variables requiredto increase xj by one unit and maintain Ax = b.

DefinitionLet x be a basic solution, let B be an associated basis, and let cB

be the vector of costs of the basic variables. For each j, we definethe reduced cost cj of the variable xj as:

cj = cj − c ′BB−1Aj .

Reduced costs

• If d is the j th basic direction, then c ′d is the rate of change ofcost along d given by

c ′d = c ′BdB + cj =−c ′BB−1Aj + cj .

• cj is the cost per unit increase in xj , while −cBB−1Aj is thecost compensating the change in the basic variables requiredto increase xj by one unit and maintain Ax = b.

DefinitionLet x be a basic solution, let B be an associated basis, and let cB

be the vector of costs of the basic variables. For each j, we definethe reduced cost cj of the variable xj as:

cj = cj − c ′BB−1Aj .

Reduced costs and optimality

TheoremConsider a BFS x associated with a basis B, and let c be thevector of reduced costs.

1 If c ≥ 0, then x is optimal.

2 If x is optimal and nondegenerate, then c ≥ 0.

DefinitionA basis matrix B is said to be optimal if :

1 B−1b ≥ 0, and

2 c ′ = c ′− c ′BB−1A≥ 0′.


TheoremConsider a BFS x associated with a basis B, and let c be thevector of reduced costs.



DefinitionA basis matrix B is said to be optimal if :

1 B−1b ≥ 0, and

2 c ′ = c ′− c ′BB−1A≥ 0′.


TheoremConsider a BFS x associated with a basis B, and let c be the vector ofreduced costs.



Proof.(1): Let y ∈ P and let d = y −x , so Ad = 0 = BdB + ∑i∈N Aidi , where Nis the set of nonbasic indices. ⇒ dB =−∑i∈N B−1Aidi , and

c ′d = c ′BdB + ∑i∈N

cidi = ∑i∈N

(ci − c ′BB−1Ai )di = ∑i∈N

cidi .

For any nonbasic i , xi = 0 and yi ≥ 0⇒ c ′d ≥ 0.


Proof.(2): Suppose x is nondegenerate and cj < 0 for some j . Then xj must benonbasic and cj is the rate of cost change along the j th basic directionwhich is feasible and guarantees a decrease.

RemarkNote that reduced cost of basic variables is zero.

cB(i) = cB(i)− c ′BB−1AB(i) = cB(i)− c ′BeB(i) = cB(i)− cB(i) = 0

RemarkIf x is a BFS and c is the associated reduced cost vector. For any feasibledirection d with x + θd ∈ P,θ > 0, we have Ax + θAd = b⇒ Ad = 0.Ad = 0 = BdB + ∑i∈N Aidi , where N is the set of nonbasic indices.⇒ dB =−∑i∈N B−1Aidi , and


cidi = ∑i∈N


cidi .







cidi = ∑i∈N


cidi .







cidi = ∑i∈N


cidi .






Development of the simplex method

• Suppose every extreme point of P = x : Ax = b,x ≥ 0 isnondegenerate.

• Suppose we are at some BFS x , with associated reduced costs c .

• If c ≥ 0, x is optimal.

• If not, then since x is nondegenerate, it cannot be optimal. Letcj < 0 for nonbasic variable xj , then the j-th basic direction is afeasible direction and it is direction of cost decrease.

• As we move along this direction, we are tracing points of the formx + θd . Note that xj = θ ≥ 0, xi = 0 ∀ i 6= B(1), . . . ,B(m), j . Thebasic variables vary as xB + θ(−B−1Aj). The objective functionvaries as c ′x + θc ′d = c ′x + θ(cj).

• The cost decreases as we increase θ and move along this direction.This can be done as long as we are feasible. So define,

θ∗ = maxθ ≥ 0 : x + θd ∈ P.








θ∗ = maxθ ≥ 0 : x + θd ∈ P.








θ∗ = maxθ ≥ 0 : x + θd ∈ P.








θ∗ = maxθ ≥ 0 : x + θd ∈ P.








θ∗ = maxθ ≥ 0 : x + θd ∈ P.








θ∗ = maxθ ≥ 0 : x + θd ∈ P.








θ∗ = maxθ ≥ 0 : x + θd ∈ P.

A formula for θ ∗ - “minimum ratio rule”

A(x + θd) = b ∀ θ ∈ R. It suffices to check nonnegativity to verify ifx + θd ∈ P.

• If d ≥ 0, then x + θd ≥ 0 for all θ ≥ 0. Then θ ∗→ ∞ and LP isunbounded.

• If di < 0 for some i , then xi + θdi ≥ 0⇒ θ ≤−xi/di . This must bemet for every i with di < 0. Hence,

θ∗ = min

i :di<0

(− xi

di

).

Note that di = 0 or 1 for nonbasic variables. Hence we have,

θ∗ = min

i=1,...,m:dB(i)<0

(−

xB(i)

dB(i)

).

Note that θ ∗ > 0 as x is nondegenerate. It will be finite as we have

assumed there exists some dB(i) < 0. θ ∗ will be referred to as the step

size.





θ∗ = min

i :di<0

(− xi

di

).


θ∗ = min

i=1,...,m:dB(i)<0

(−

xB(i)

dB(i)

).



size.





θ∗ = min

i :di<0

(− xi

di

).


θ∗ = min

i=1,...,m:dB(i)<0

(−

xB(i)

dB(i)

).



size.





θ∗ = min

i :di<0

(− xi

di

).


θ∗ = min

i=1,...,m:dB(i)<0

(−

xB(i)

dB(i)

).



size.

The next BFSOnce a finite θ ∗ > 0 is obtained, we have the new feasible solution:y = x + θ ∗d ∈ P. Let k be the minimizing index in the θ ∗ formula. Thatis,

θ∗ = min

i=1,...,m:dB(i)<0

(−

xB(i)

dB(i)

)=−

xB(k)

dB(k).

Note that yB(k) = xB(k) + θ ∗dB(k) = 0;yB(i) = xB(i) + θ ∗dB(i) ≥ 0, i = 1, . . . ,m; yj = xj + θ ∗dj = θ ∗ > 0;yi = 0, i 6= B(1), . . . ,B(m), j .We claim that y is a BFS and the associated basic indices areB(i), i = 1, . . . ,m where

B(i) =

B(i), i 6= k ,

j , i = k .

The new basis matrix is B = [AB(1) · · ·AB(k−1) Aj AB(k+1) · · ·AB(m)]. That

is, nonbasic variable xj has entered the basis and basic variable xB(k) has

left the basis. To show that y is indeed a BFS, it suffices to show that

AB(i) i 6= k and Aj are linearly independent. Since we already know that

yi = 0 for all i 6= B(1), . . . ,B(k−1), j ,B(k + 1), . . . ,B(m).


θ∗ = min

i=1,...,m:dB(i)<0

(−

xB(i)

dB(i)

)=−

xB(k)

dB(k).


B(i) =

B(i), i 6= k ,

j , i = k .







θ∗ = min

i=1,...,m:dB(i)<0

(−

xB(i)

dB(i)

)=−

xB(k)

dB(k).


B(i) =

B(i), i 6= k ,

j , i = k .






Theorem

1 The columns AB(i) i 6= k and Aj are linearly independent, and

therefore, B is a basis matrix.

2 The vector y = x + θ ∗d is the basic feasible solutionassociated with the basis matrix B.

Given a nondegenerate BFS x ∈ P, the new BFS x + θ ∗d isdistinct from x since θ ∗ is positive; since d is direction of costdecrease, the objective at the new BFS is strictly smaller. Hence,we have moved to a new BFS with better objective. This is atypical iteration of the simplex method, and it is also known as asimplex pivot. For convenience, define a u ∈ Rm as follows:

u =−dB = B−1Aj ,

where Aj is the column entering the basis, and ui =−dB(i) fori = 1, . . . ,m.

Theorem

1 The columns AB(i) i 6= k and Aj are linearly independent, and

therefore, B is a basis matrix.

2 The vector y = x + θ ∗d is the basic feasible solutionassociated with the basis matrix B.

Given a nondegenerate BFS x ∈ P, the new BFS x + θ ∗d isdistinct from x since θ ∗ is positive; since d is direction of costdecrease, the objective at the new BFS is strictly smaller. Hence,we have moved to a new BFS with better objective. This is atypical iteration of the simplex method, and it is also known as asimplex pivot. For convenience, define a u ∈ Rm as follows:

u =−dB = B−1Aj ,

where Aj is the column entering the basis, and ui =−dB(i) fori = 1, . . . ,m.

An iteration of the simplex method

1 Given a basis B = [AB(1) · · · · · ·AB(m)] and an associated BFS x .

2 Compute the reduced costs cj = cj − cBB−1Aj for each nonbasic j .If cj ≥ 0 for all nonbasic j , then x is optimal; terminate. Otherwisechoose some j for which cj < 0.

3 Compute u = B−1Aj . If no component of u is positive (⇒ d ≥ 0),we have θ ∗→ ∞ and LP is unbounded; terminate.

4 If some component of u is positive, let θ ∗ = mini=1,...,m:ui>0

(xB(i)

ui

).

5 Let θ ∗ =xB(k)

uk. Form a new basis by replacing AB(k) with Aj and a

new BFS y = x + θ ∗d .






(xB(i)

ui

).

5 Let θ ∗ =xB(k)








(xB(i)

ui

).

5 Let θ ∗ =xB(k)








(xB(i)

ui

).

5 Let θ ∗ =xB(k)








(xB(i)

ui

).

5 Let θ ∗ =xB(k)



The simplex method for nondegenerateproblems

TheoremAssume that the feasible set is nonempty and that every basicfeasible solution is nondegenerate. Then, the simplex methodterminates after a finite number of iterations. At termination, oneof the following is true:

1 We have an optimal basis B and an associated basic feasiblesolution which is optimal.

2 We have a vector d satisfying Ad = 0,d ≥ 0 and c ′d < 0, andthe optimal cost is −∞.

The simplex method for degenerateproblems

In the presence of degeneracy, the following situations could arisethat are not possible if every BFS is nondegenerate.

1 If the current BFS x is degenerate, θ ∗ could be zero, in whichcase y = x . This happens, when there is a k such thatxB(k) = 0 and dB(k) < 0. Note that, we can still carry out a

pivot replacing AB(k) by Aj to obtain a new basis B, andy = x is the new/old BFS.

2 Even when θ ∗ is positive. If there exist two basic indices B(k)and B(r) such that θ ∗ =

xB(k)

uk=

xB(r)

ur, we have

yB(r) = xB(r) + θ ∗dB(r) = 0 as k leaves the basis resulting in adegenerate BFS y .

The simplex method for degenerateproblems

• The first observation is problematic as it suggests that the simplexmethod outlined earlier could “stall” at some BFS if it isdegenerate. However, the fact that we can still carry out a pivotoperation is encouraging as we “might” be able to come out of thedegenerate BFS (either conclude optimality to move to animproving BFS) after a finite number iterations- but it could beexponentially large and finite!

• A bigger problem is if we return to a previously visited basis of thedegenerate BFS, resulting in the algorithm “cycling” through a setof bases without terminating (an infinite loop).

• There are however options available to us in each iteration in howwe carry out a pivot. A careful choice will prevent this fromoccurring, guaranteeing finite termination of the simplex methodunder such pivot rules.

Pivot Selection

We have options to select an entering variable: any variable with cj < 0 isa candidate. We also have options in selecting a leaving variable ifseveral basic indices achieve the step size θ ∗. That is, any k for whichθ ∗ =− xB(k)

dB(k)is a candidate. A pivot rule describes how these choices are

to be made.Entering variable selection:

1 Choose a column Aj , with cj < 0, whose reduced cost is the mostnegative.

2 Choose a column with cj < 0 for which the corresponding costdecrease θ ∗|cj | is largest.

3 Choose a column Aj , with cj < 0, whose j is the smallest.*

Leaving variable selection:

1 Among all basic indices achieving θ ∗, choose the variable with thesmallest subscript.*

Pivot Selection









Pivot Selection














Basic observations

• The vector u = B−1Aj is obviously computed frequently- it isused to compute reduced costs, direction of move, step size θ ∗

• How u is computed and what information is carried from oneiteration to the next, leads to different implementations of thesimplex method

• Given an m×m basis B and a b ∈ Rm, computing the inverseof B or solving a linear system of the form Bx = b takesO(m3)

• Computing a matrix vector product Bb takes O(m2)operations

• Computing an inner product p′b takes O(m) operations

Naive implementation

1 Given a basic indices B(1), . . . ,B(m)

2 Form B, compute p′ = C ′BB−1 by solving the linear systemp′B = c ′B for the unknown p, the vector of simplex multipliers

3 Compute the reduced costs cj = cj −p′Aj and identify enteringvariables according to the chosen pivot rule.

4 Once an entering column Aj is identified, solve Bu = Aj todetermine u = B−1Aj .

5 Now find the direction d , find θ ∗, find the next BFS.

O(m3) operations are needed to solve p′B = c ′B and Bu = Aj . Computing

all reduced costs requires O(mn) operations. Computational effort per

iteration is O(m3 + mn).














































Revised simplex method

• Significant computational burden is due to the solution of twolinear systems of equations. If B−1 is provided initially, and weupdate the matrix from iteration to iteration effectively, wecould save computational effort.

• Let B = [AB(1) AB(2) · · ·AB(m)] be the basis at the beginningof an iteration and letB = [AB(1) · · ·AB(k−1) Aj AB(k+1) · · ·AB(m)] be the basismatrix at the beginning of the next iteration. That is, AB(k)

has been replaced by Aj .

DefinitionGiven a matrix, the operation of adding a constant multiple of onerow to the same row or to another row is called an elementary rowoperation.

Revised simplex method

• Significant computational burden is due to the solution of twolinear systems of equations. If B−1 is provided initially, and weupdate the matrix from iteration to iteration effectively, wecould save computational effort.

• Let B = [AB(1) AB(2) · · ·AB(m)] be the basis at the beginningof an iteration and letB = [AB(1) · · ·AB(k−1) Aj AB(k+1) · · ·AB(m)] be the basismatrix at the beginning of the next iteration. That is, AB(k)

has been replaced by Aj .

DefinitionGiven a matrix, the operation of adding a constant multiple of onerow to the same row or to another row is called an elementary rowoperation.

Q =

1 0 20 1 00 0 1

C =

1 23 45 6

Suppose we wish to multiply the third row of C by 2 and adding itto the first.

QC =

11 143 45 6

Matrix product QC accomplishes that; notice the structure of Qand the location of the entry 2.Q = I + D13 where D13 is a matrix of zeros in every position except(row 1, column 3), which is 2.

Elementary row operations

• Multiplying row j by β and adding it to row i (for i 6= j) is thesame as pre-multiplying by the matrix Q = I + Dij , where Dij

is a matrix with entry (i , j) = β , and zero elsewhere.

• det(Q) = 1

• A sequence of K elementary operations is the same aspre-multiplying by the invertible matrix QK QK−1 · · ·Q1

• Since B−1B = I and B−1AB(i) = ei , we have

B−1B = [e1 · · · ek−1 u ek+1 · · · em],

B−1B =

1 0 · · · u1 · · · 0

0 1 · · ·... · · · 0

......

... uk...

......

......

......

...0 0 · · · um · · · 1

Elementary row operations

We wish to change the above matrix B−1B into I by the followingoperations

1 For each row i 6= k , we add row k times −ui/uk to row i .Note that uk > 0. This replaces ui by zero.

2 We divide row k by uk , replaces uk by one.

Let Q be the invertible matrix that accomplishes this sequence ofelementary row operations. Then QB−1B = I, or QB−1 = B−1.

An iteration of the revised simplex method

1 Given B(1), . . . ,B(m), associated x and B−1

2 Compute p′ = c ′BB−1 and cj = cj −p′Aj . If they are all nonnegative,the current BFS is optimal, so terminate; else, choose a j for whichcj < 0

3 Compute u = B−1Aj . If no component of u is positive, the optimalcost is −∞, so terminate

4 If some component of u is positive, let

θ∗ = min

i=1,...,m:ui>0

xB(i)

ui

5 Let k be such that θ ∗ =xB(k)

uk, form a new basis replacing B(k) with

j ; If y is the new BFS, yj = θ ∗ and yB(i) = xB(i)−θ ∗ui fori = 1, . . . ,m, i 6= k.

6 Form the m× (m + 1) matrix [B−1|u]. Add to each one of its rows amultiple of row k to make tha last column become ek . The first mcolumns then yield B−1.






θ∗ = min

i=1,...,m:ui>0

xB(i)

ui



j ; If y is the new BFS, yj = θ ∗ and yB(i) = xB(i)−θ ∗ui fori = 1, . . . ,m, i 6= k .







θ∗ = min

i=1,...,m:ui>0

xB(i)

ui










θ∗ = min

i=1,...,m:ui>0

xB(i)

ui










θ∗ = min

i=1,...,m:ui>0

xB(i)

ui










θ∗ = min

i=1,...,m:ui>0

xB(i)

ui





Full tableau implementation

Maintain and update the m× (n + 1) matrix

B−1[b|A],

whose columns are B−1b,B−1A1, . . . ,B−1An. We will refer to B−1b asthe zeroth column and B−1Ai as the i-th column of the tableau.

The column u = B−1Aj corresponding to the entering variable is calledthe pivot column; if the k-th basic variable leaves the basis, the k-th rowis called the pivot row; the common element is called the pivot element.Note that uk is the pivot element and uk > 0 if we are carrying out apivot operation.


Maintain and update the m× (n + 1) matrix

B−1[b|A],

whose columns are B−1b,B−1A1, . . . ,B−1An. We will refer to B−1b asthe zeroth column and B−1Ai as the i-th column of the tableau.

The column u = B−1Aj corresponding to the entering variable is calledthe pivot column; if the k-th basic variable leaves the basis, the k-th rowis called the pivot row; the common element is called the pivot element.Note that uk is the pivot element and uk > 0 if we are carrying out apivot operation.


• With the equality constraints given to us as b = Ax , and the currentbasis matrix B, the system we track is B−1b = B−1Ax .

• So we update B−1[b|A] and compute B−1[b|A] by left multiplyingby a matrix Q satisfying QB−1 = B−1.

• We also augment this matrix B−1[b|A] with a zeroth row; the entryin zeroth row and zeroth column is −cBB−1b, negative of currentcost. The rest of this row consists of reduced costs, that is thevector c ′ = c ′− cBB−1A.

• The update rule is to ensure that the reduced cost of the enteringvariable is made 0 by adding a scalar multiple of the pivot row.

−c ′BB−1b c ′− cBB−1AB−1b B−1A

An iteration of the full tableauimplementation

1 Given the initial tableau associated with a BFS x and basis B

2 If the reduced costs in the zeroth row of the tableau are allnonnegative, the current BFS is optimal, so terminate; else, choosea j for which cj < 0

3 Consider the u = B−1Aj , the pivot column. If no component of u ispositive, the optimal cost is −∞, so terminate


θ∗ = min

i=1,...,m:ui>0

xB(i)

ui


uk, the new basis replacing AB(k) with Aj ;

6 Add to each row of the tableau a constant multiple of row k (pivotrow) so that the pivot element becomes one and all other entriesbecome zero






θ∗ = min

i=1,...,m:ui>0

xB(i)

ui









θ∗ = min

i=1,...,m:ui>0

xB(i)

ui









θ∗ = min

i=1,...,m:ui>0

xB(i)

ui









θ∗ = min

i=1,...,m:ui>0

xB(i)

ui









θ∗ = min

i=1,...,m:ui>0

xB(i)

ui




Some remarks on the implementations

Read the comparison on p105–106.

Full tableau Revised simplex

Memory O(mn) O(m2)

Worst-case time per iter. O(mn) O(mn)

Best-case time per iter. O(mn) O(m2)

• Revised simplex is generally preferable as we save by notcomputing the B−1A completely

• It also permits us to use sparse matrix representations when Ais sparse

• Reinversion

• Eta factorization of the basis

Some remarks on the implementations

Read the comparison on p105–106.

Full tableau Revised simplex

Memory O(mn) O(m2)

Worst-case time per iter. O(mn) O(mn)

Best-case time per iter. O(mn) O(m2)

• Revised simplex is generally preferable as we save by notcomputing the B−1A completely

• It also permits us to use sparse matrix representations when Ais sparse

• Reinversion

• Eta factorization of the basis






Artificial variables

• Consider the problem minc ′x : Ax = b,x ≥ 0 such that b ≥ 0.

• Introduce a vector y ∈ Rm of artificial variables and solve theauxiliary problem

miny1 + y2 + · · ·+ ym : Ax + y = b,x ≥ 0,y ≥ 0.

• Auxiliary LP has an optimal cost of zero if and only if original LP isfeasible.

• If the auxiliary LP solves to an optimal value of zero, with all theartificial variables nonbasic, we have an initial basic feasible solutionto the original LP.

• Sometimes we could have a degenerate optimal solution to the aux.LP with some y variables in the basis at zero. This will require us todrive the artificial variables out of the basis. To drive a particularbasic yi at zero out- identify a pivot column Aj whose pivot elementis nonzero to carry out a pivot operation. Repeat this until all basicy variables (at 0) are out. This is guaranteed to work as long as Ahas full row rank.

The two-phase simplex method

A complete algorithm for minc ′x : Ax = b,x ≥ 0 with b ≥ 0.

Phase I.

1 Introduce artificial variables y1, . . . ,ym, if necessary, and solve theauxiliary problem with cost ∑

mi=1 yi

2 If the optimal cost is positive, the original problem is infeasible;terminate

3 If the optimal cost is zero, and if no artificial variable is in the basis,a feasible basis for the original problem has been obtained

4 If the k-th basic variable is artificial, examine the k-th entry ofB−1Aj for j = 1, . . . ,n. If all these entries are zero, k-th row isredundant and is eliminated. Otherwise, if the k-th entry of the j-thcolumn is nonzero, carry out a pivot operation. Repeat this stepuntil all artificial variables are out of the basis.

Phase II.

1 Use the basis from Phase I as the initial basis to solve theoriginal problem using the simplex method.

2 Compute the reduced costs of all variables for this initialbasis, using the original cost coefficients.

3 Apply the simplex method to the original problem.

This overall algorithm is designed to initialize, go through a finitenumber of simplex pivot operations and terminate as long as thesimplex method invoked can avoid cycling.






Lexicography

DefinitionA vector u ∈ Rn is said to be lexicographically larger (or smaller)than another vector v ∈ Rn if u 6= v and the first nonzerocomponent of u−v is positive (or negative, respectively).Symbolically we write

u >L v or u <L v .

Examples:(0,2,3,0) >L (0,2,1,4),

(0,4,5,0) <L (1,2,1,2).

When u >L 0, we say that u is lexicographically positive.

Lexicographic pivot rule

1 Choose an entering column Aj arbitrarily, as long as itsreduced cost cj is negative. Let u = B−1Aj be the j-th columnof the tableau.

2 For each i with ui > 0, divide the i-th row of the tableau(including the entry in the zeroth column) by ui and choosethe lexicographically smallest row. If row k is lexicographicallysmallest, then the k-th basic variable xB(k) exits the basis.

Lexicographic pivot rule: example

Say the zeroth row has been omitted and the pivot column is the thirdcolumn (j = 3)

1 0 5 3 · · ·2 4 6 -1 · · ·3 0 7 9 · · ·

⇓

1/3 0 5/3 1 · · ·* * * * · · ·

1/3 0 7/9 1 · · ·

There is a tie between xB(1) and xB(3) for leaving variable. But the third

row is lexicographically smallest as 7/9 < 5/3. So xB(3) exits the basis.

L-rule always leads to a unique choice for the leaving variable.

Anticycling theorem for lexicographic pivotrule

TheoremSuppose that the simplex algorithm starts with all the rows of thesimplex tableau, other than the zeroth row, lexicographicallypositive. Suppose that the lexicographic pivoting rule is followed.Then:

1 Every row of the simplex tableau, other than the zeroth row,remains lexicographically positive throughout the algorithm.

2 The zeroth row strictly increases lexicographically at eachiteration.

3 The simplex method terminates after a finite number ofiterations.

Bland’s pivot rule

1 Find the smallest j for which the reduced cost cj is negative andhave the column Aj enter the basis

2 Out of all xi tied in the minimum ratio test for the leaving variable,choose the one with the smallest value of i

Remark“Something” has to be monotonically increasing/decreasing to provefinite convergence. In the nondegenerate case, the objective functionvalue is monotonically decreasing. While using the lexicographic rule, thezeroth row is monotonically increasing.In Bland’s rule, if variable xj enters the basis, it cannot leave until somexr , r ≥ j + 1 that was nonbasic when xj entered, also enters the basis. Ifthis holds, cycling cannot occur. In a cycle, any variable that enters mustalso exit. Hence, there exists some highest indexed variable that entersand leaves the basis. This contradicts the monotonicity imposed byBland’s rule.

Bland’s pivot rule

1 Find the smallest j for which the reduced cost cj is negative andhave the column Aj enter the basis

2 Out of all xi tied in the minimum ratio test for the leaving variable,choose the one with the smallest value of i

Remark“Something” has to be monotonically increasing/decreasing to provefinite convergence. In the nondegenerate case, the objective functionvalue is monotonically decreasing. While using the lexicographic rule, thezeroth row is monotonically increasing.In Bland’s rule, if variable xj enters the basis, it cannot leave until somexr , r ≥ j + 1 that was nonbasic when xj entered, also enters the basis. Ifthis holds, cycling cannot occur. In a cycle, any variable that enters mustalso exit. Hence, there exists some highest indexed variable that entersand leaves the basis. This contradicts the monotonicity imposed byBland’s rule.

Other topics

• The big-M method

• Column geometry and the simplex method

• Computational efficiency of the simplex method

• Is there a polynomial time algorithm for LP?

• Is there a strongly-polynomial time algorithm for LP?

• Complexity of detecting degeneracy

• Is there an “efficient” pivot rule that makes the simplexmethod polynomially bounded?

• Simplex implementation using product form of the inverse

• Simplex method for bounded variables

• Formal proofs of anticycling rules

• The diameter of polyhedra and Hirsch conjecture (recentlydisproved, but the motivation behind the conjecture is stillrelevant)






The primal problem

• Consider the standard form problem minc ′x : Ax = b,x ≥ 0. Wewill call this the primal problem. Suppose an optimal solution x∗

exists.

• Consider the relaxed problem in which the constraint Ax = b isreplaced by a penalty p′(b−Ax), where p is a price vector of thesame dimension as b. That is, we are consideringminc ′x + p′(b−Ax) : x ≥ 0. Let g(p) be the optimal cost for therelaxed problem, as a function of p. Since x∗ is still feasible, we have

g(p) = minx≥0

[c ′x + p′(b−Ax)]≤ c ′x∗+ p′(b−Ax∗) = c ′x∗.

• Thus, each p leads to a lower bound g(p) for the optimal cost c ′x∗.Hence, the best lower bound (i.e., the largest lower bound can beobtained by solving the unconstrained optimization problem

maxg(p).

This is called the dual problem.

The primal problem


exists.


g(p) = minx≥0

[c ′x + p′(b−Ax)]≤ c ′x∗+ p′(b−Ax∗) = c ′x∗.


maxg(p).


The primal problem


exists.


g(p) = minx≥0

[c ′x + p′(b−Ax)]≤ c ′x∗+ p′(b−Ax∗) = c ′x∗.


maxg(p).


The dual problem

g(p) = minx≥0

[c ′x + p′(b−Ax)]

= p′b + minx≥0

(c ′−p′A)x

minx≥0

(c ′−p′A)x = 0 if (c ′−p′A)≥ 0′

minx≥0

(c ′−p′A)x = −∞ otherwise.

Since we are interested in p for which the lower bound is largest, we areonly in p for which the lower bound is not trivial (−∞). Hence, the dualproblem is equivalent to the linear programming problem:

maxp′b

p′A≤ c ′

The main duality theorem says that when the primal has an optimalsolution x∗, the dual has an optimal solution p∗ such that c ′x∗ = p∗′b.That is when the price p for violating the constraints is chosen to be p∗,the option of violating the constraints Ax = b has no value.

The dual problem

g(p) = minx≥0

[c ′x + p′(b−Ax)]

= p′b + minx≥0

(c ′−p′A)x

minx≥0

(c ′−p′A)x = 0 if (c ′−p′A)≥ 0′

minx≥0

(c ′−p′A)x = −∞ otherwise.

Since we are interested in p for which the lower bound is largest, we areonly in p for which the lower bound is not trivial (−∞). Hence, the dualproblem is equivalent to the linear programming problem:

maxp′b

p′A≤ c ′

The main duality theorem says that when the primal has an optimalsolution x∗, the dual has an optimal solution p∗ such that c ′x∗ = p∗′b.That is when the price p for violating the constraints is chosen to be p∗,the option of violating the constraints Ax = b has no value.

Various forms of primal-dual pairs

minc ′x : Ax = b,x ≥ 0

andmaxp′b : p′A≤ c ′

minc ′x : Ax ≥ b

andmaxp′b : p′A = c ′,p ≥ 0

Remark“The dual of the dual is the primal.”


















Weak duality

TheoremIf x is a feasible solution to the primal (min) problem and p is a feasiblesolution to the dual (max) problem, then

p′b ≤ c ′x .

Corollary(a) If the optimal cost in the primal is unbounded (−∞), then the dualmust be infeasible.(b) If the optimal cost in the dual is unbounded (+∞), then the primalproblem must be infeasible.

CorollaryIf x and p are primal and dual feasible, such that p′b = c ′x, then x and pare optimal solutions to primal and dual respectively.

Weak duality


p′b ≤ c ′x .



Weak duality


p′b ≤ c ′x .



Proof.For any x and p define ui = pi (a′ix−bi ) and vj = (cj −p′Aj)xj . If xand p are primal and dual feasible, we have ui ≥ 0 ∀ i andvj ≥ 0 ∀ j . Note that

∑i

ui = p′Ax−p′b ≥ 0,

and

∑j

vj = c ′x−p′Ax ≥ 0.

⇒ c ′x−p′b ≥ 0.

Strong Duality

TheoremIf a linear programming problem has an optimal solution so does its dual,and the respective optimal costs are equal.

Proof.Consider minc ′x : Ax = b,x ≥ 0. Assume that A has full row rank andthere exists an optimal solution. Then let B be an optimal basis asidentified at the termination of the simplex method. Hence, xB = B−1bare the optimal basic variable values and c ′− c ′BB−1A≥ 0′.Let p′ = c ′BB−1. We then have p′A≤ c ′, which shows p is feasible to thedual

maxp′b : p′A≤ c ′.

Furthermore,p′b = c ′BB−1b = c ′BxB .

Strong Duality





Strong Duality





Primal and dual- possible outcomes

Finite optimum Unbounded InfeasibleFinite optimum Possible Impossible Impossible

Unbounded Impossible Impossible Possible

Infeasible Impossible Possible Possible

Complementary slackness theorem

TheoremLet x and p be feasible solutions to the primal and dual problem,respectively. The vectors x and p are optimal solutions for the tworespective problems if and only if:

pi (a′ix−bi ) = 0 ∀ i ,

(cj −p′Aj)xj = 0 ∀ j .

Proof.Let ui = pi (a′ix−bi ) and vj = (cj −p′Aj)xj . As before, since x andp are feasible to primal and dual, ui ≥ 0, vj ≥ 0 andc ′x−p′b = ∑i ui + ∑j vj .By strong duality, if x and p are optimal c ′x = p′b and hence eachui and vj is zero. Conversely if each ui and vj is zero, we havec ′x = p′b and by weak duality x and p are optimal to therespective problems.



pi (a′ix−bi ) = 0 ∀ i ,

(cj −p′Aj)xj = 0 ∀ j .




pi (a′ix−bi ) = 0 ∀ i ,

(cj −p′Aj)xj = 0 ∀ j .


The geometric view

• Consider the primal minc ′x : a′ix ≥ bi , i = 1, . . . ,m. Supposex ∈ Rn and rank(A) = n. The dual ismaxp′b : ∑

mi=1 piai = c ,p ≥ 0. Suppose I ⊆ 1, . . . ,m with

|I |= n and ai , i ∈ I are linearly independent.

• The system a′ix = bi , i ∈ I has a unique solution, say x I , whichis a basic solution to the primal. Assume x I is nondegenerate,that is a′ix

I 6= bi , ∀ i /∈ I . For p ∈ Rm to be optimal to thedual and x I to be optimal to the primal, we need:

a′ix ≥ bi ∀ i (primal feasibility) (4)

pi = 0 ∀ i /∈ I (complementary slackness) (5)m

∑i=1

piai = c (dual feasibility) (6)

p ≥ 0 (dual feasibility) (7)

The geometric view








∑i=1



The geometric view








∑i=1



The geometric view

• With complementary slackness enforced, the conditions require c tobe expressed as a nonnegative linear combination of ai . That is

∑i∈I

piai = c .

• If x I is a nondegenerate optimal solution, then with complementaryslackness enforced, we have a unique dual solution pI ≥ 0 and theobjective c can be expressed as a nonnegative combination of theactive constraints at x I . (these would be linearly independent as x I

is nondegenerate)

• If x∗ is a degenerate basic solution to the primal, there can beseveral subsets I that correspond to n linearly independent activeconstraints at x∗. Different choices of I lead to different systems

∑i∈I piai = c , possibly resulting in different dual basic solutions pI .

• If for some I , pI is dual feasible, and x∗ is primal feasible, they areoptimal to the respective problems as complementary slackness asalso been imposed in finding pI .

The geometric view


∑i∈I

piai = c .


is nondegenerate)




The geometric view


∑i∈I

piai = c .


is nondegenerate)




The geometric view


∑i∈I

piai = c .


is nondegenerate)




Optimal dual variables as marginal costs

• Consider the standard form problem minc ′x : Ax = b,x ≥ 0 whereA has full row rank and there is a nondegenerate optimal BFS x∗

found by the simplex method. Suppose B be the correspondingbasis, and hence xB = B−1b > 0 by nondegeneracy.

• Let b be perturbed by a vector d such that B−1(b + d) > 0. Thenthe current basis is feasible to the perturbed problem. The reducedcost vector c ′− c ′BB−1A≥ 0 as before. So the old basis is optimalto the perturbed problem with a new optimal basic values given byB−1(b + d).

• The new optimal cost is given by c ′BB−1(b + d) = p′(b + d) wherep = c ′BB−1 is an optimal solution to the dual. A small change d inthe RHS vector b results in a change of p′d in the optimal cost.

• This implies that each component pi can be interpreted as themarginal cost (shadow price), the rate of change of optimal costwith respect to the RHS bi .
























The dual simplex method: Motivation

• The simplex method starts with a primal feasible solution andmaintains primal feasibility through the iterations. Itterminates when c ′− c ′BB−1A≥ 0.

• With p′ = c ′BB−1, the simplex method can be seen toterminate when p′A≤ c ′. That is, when the complementarydual solution becomes feasible. Such an algorithm is called aprimal algorithm.

• If we start with and maintain dual feasibility, and terminate atprimal feasibility, the algorithm is referred to as a dualalgorithm.

• This section will develop a dual simplex method for solvingminc ′x : Ax = b,x ≥ 0 with full row rank A. It will then beshown that this algorithm also solves the dual to the standardform LP.

The dual simplex method: Full tableauimplementation

−c ′BB−1b c1− c ′BB−1A1 · · · cn− c ′BB−1An

B−1b B−1A1 · · · B−1An

• We assume that c ′− c ′BB−1A≥ 0, that is p′A≤ c ′ wherep′ = C ′BB−1. But we do not require B−1b ≥ 0. So we are at abasic solution, that is not necessarily feasible to the primal.

• Note that p is a feasible solution to the dual with costp′b = c ′BB−1b.

• If B−1b ≥ 0, then xB = B−1b is a feasible solution whoseobjective value is c ′BxB = c ′BB−1b. Thus we have primal anddual feasible solutions with the same objective value. Hence,they are optimal to the respective problems.

• If some basic variable is negative, we perform a pivotoperation.



B−1b B−1A1 · · · B−1An







B−1b B−1A1 · · · B−1An







B−1b B−1A1 · · · B−1An





The dual simplex method: Pivots

• Find some k such that xB(k) < 0 and let that row be the pivot row.It is of the form (xB(k),v1, . . . ,vn) where vi is the k-th component ofB−1Ai .

• For each i with vi < 0, if such an i exists, form the ratio ci/|vi |. Letthe ratio be minimized by j . vj is the pivot element.

cj

|vj |= min

i :vi<0

ci

|vi |.

• Now, we carry out a change of basis, by replacing xB(k) with xj .Note that xj must be nonbasic. The pivot operation is carried outreplacing AB(k) in the basis with Aj as in the case of the primalsimplex method.

• In particular, the new reduced costs will be given by ci + vicj

|vj |. The

ratio rule ensures they remain nonnegative, that is dual feasibility ismaintained.




cj

|vj |= min

i :vi<0

ci

|vi |.



|vj |. The





cj

|vj |= min

i :vi<0

ci

|vi |.



|vj |. The





cj

|vj |= min

i :vi<0

ci

|vi |.



|vj |. The


The dual simplex method: Termination

• We know that the reduced cost of the pivot column isnonnegative. Suppose assume that cj > 0, then the entry at

(0,0) in the tableau is −c ′BB−1b + xB(k)cj

|vj | is decreased or the

corresponding dual objective value for the solution has strictlyincreased.

• As long as, the reduced cost of every nonbasic variable ispositive, the dual cost monotonically increases, and no basiswill ever be repeated. The algorithm must eventuallyterminate in one of two ways:(a) B−1b ≥ 0, and we have an optimal solution.(b) All entries in (v1, . . . ,vn) are nonnegative and hence, wehave no pivot element. This implies that the dual cost can bedriven to +∞ and hence the primal is infeasible.













An iteration of the dual simplex method

1 An initial tableau that is dual feasible (all reduced costs arenonnegative).

2 If B−1b ≥ 0, we have an optimal BFS, terminate; otherwise choosek such that xB(k) < 0.

3 Consider k-th row of the tableau with entries (xB(k),v1, . . . ,vn), thisis the pivot row. If vi ≥ 0 ∀ i , the dual is unbounded (primal isinfeasible), terminate.

4 For each i such that vi < 0, compute the ratio ci/|vi | and let j bethe index minimizing this ratio. Column AB(k) exits, and Aj enters.

5 Add to each row of the tableau, a multiple of the pivot row so thatpivot element becomes 1 and all other entries in the pivot columnbecome 0.

Geometry of the dual simplex method

minx1 + x2 : x1 + 2x2 ≥ 2,x1 ≥ 1,x1 ≥ 0,x2 ≥ 0

max2p1 + p2 : p1 + p2 ≤ 1,2p1 ≤ 1,p1 ≥ 0,p2 ≥ 0

Anticycling

Recall that termination is guaranteed if the reduced cost of the enteringvariable in each iteration is positive. If cj = 0 in the pivot column, thezeroth row of the tableau does not change and the dual objective remainsunchanged. Then the algorithm could cycle.

Definition (Lexicographic pivot rule.)

1 Choose any row k with xB(k) < 0 as the pivot row.

2 Determine pivot column Aj as follows. For each i with vi < 0, divideall entries by |vi |, and then choose lexicographically smallestcolumn. If there is a tie between several lexicographically smallestcolumns, choose the one with the smallest index.

RemarkIf the dual simplex method is initialized so that every column (cj ,B−1Aj)is lexicographically positive, and the above pivot rule is used, the methodterminates in a finite number of steps.

Anticycling






Anticycling






Duality and degeneracy

min3x1 + x2 : x1 + x2 ≥ 2,2x1−x2 ≥ 0,x1 ≥ 0,x2 ≥ 0

max2p1 : p1 + 2p2 ≤ 3,p1−p2 ≤ 1,p1 ≥ 0,p2 ≥ 0

Duality and degeneracy: Summary

1 Every basis determines a basic solution to the primal B−1band a corresponding basic solution to the dual p′ = c ′BB−1.

2 The dual basic solution is feasible (p′A≤ c ′) if and only if allof the reduced costs are nonnegative (c ′ = c ′− c ′BB−1A≥ 0).

3 Under this dual basic solution, the reduced costs that areequal to zero correspond to active constraints in the dualproblem.

4 This dual basic solution is degenerate if and only if somenonbasic variable has a zero reduced cost.






Farkas’ Lemma

TheoremLet A be a matrix of dimensions m×n and let b be a vector in Rm.Then, exactly one of the following two alternatives holds:

1 There exists some x ≥ 0 such that Ax = b.

2 There exists some vector p such that p′A≥ 0′ and p′b < 0

b

A1

A2

p

A3

z : p’z = 0

Farkas’ LemmaTheoremLet A be a matrix of dimensions m×n and let b be a vector in Rm.Then, exactly one of the following two alternatives holds:



Proof.If ∃x ≥ 0 satisfying Ax = b, and if p′A≥ 0′, then p′b = p′Ax ≥ 0, whichshows the second alternative cannot hold.Suppose there is no x ≥ 0 satisfying Ax = b. Consider the pair of LPs:

max0′x : Ax = b,x ≥ 0,

minp′b : p′A≥ 0′,

The max problem is infeasible, which implies that the min problem iseither infeasible or unbounded. Since p = 0 is the feasible solution to themin problem, it must be unbounded. Hence, there exists p such thatp′A≥ 0′ and p′b < 0.





max0′x : Ax = b,x ≥ 0,







max0′x : Ax = b,x ≥ 0,



From separating hyperplanes to duality

1 Weierstrass’ Theorem. If F : Rn −→ R is a continuous function, andif S is nonempty, closed, and bounded subset of Rn, then thereexists some x∗ ∈ S such that f (x∗)≤ f (x) ∀ x ∈ S . Similarly thereis a y∗ ∈ S such that f (y∗)≥ f (x) ∀ x ∈ S .

2 Separation Theorem. Let S be a nonempty, closed, convex set in Rn

and let x∗ ∈ Rn that is not in S . Then there exists some vectorc ∈ Rn such that c ′x∗ < c ′x ∀ x ∈ S .

3 These theorems can be used in developing Farkas’ Lemma withoutinvoking the simplex method. Note that Farkas’ Lemma predateslinear programming and the simplex method.

4 Farkas’ Lemma can be used to prove the strong duality theorem ofLP without invoking the simplex termination condition to supply anoptimal basis.

5 Farkas’ Lemma (and other theorems of alternatives), separationtheorem and Weierstrass’ theorem are fundamental to thedevelopment of optimization theory in general, not just LP.






Polyhedral cones

DefinitionA set C ⊂ Rn is a cone if λ x ∈ C ∀ λ ≥ 0 and for all x ∈ C . Apolyhedron of the form P = x ∈ Rn : Ax ≥ 0 is a nonempty coneand is called a polyhedral cone.

TheoremLet C ⊂ Rn be the polyhedral cone defined by the constraintsa′ix ≥ 0, i = 1, . . . ,m. Then, the following are equivalent:

1 The zero vector is an extreme point of C .

2 The cone C does not contain a line.


Polyhedral cones

DefinitionA set C ⊂ Rn is a cone if λ x ∈ C ∀ λ ≥ 0 and for all x ∈ C . Apolyhedron of the form P = x ∈ Rn : Ax ≥ 0 is a nonempty coneand is called a polyhedral cone.

TheoremLet C ⊂ Rn be the polyhedral cone defined by the constraintsa′ix ≥ 0, i = 1, . . . ,m. Then, the following are equivalent:

1 The zero vector is an extreme point of C .

2 The cone C does not contain a line.


Rays and recession conesConsider a nonempty polyhedron

P = x ∈ Rn : Ax ≥ b,

and fix some y ∈ P. The recession cone at y is the set of directions dalong which we can move indefinitely away from y , without leaving P.Formally,

recession− cone(y) = d ∈ Rn|A(y + λ d)≥ b, ∀ λ ≥ 0.

This is the same as the polyhedral cone independent of y ,

recession− cone(y) = d ∈ Rn : Ad ≥ 0.

The nonzero elements of the recession cone are called the rays of thepolyhedron P.For the standard form polyhedron P = x ∈ Rn : Ax = b,x ≥ 0, therecession cone is given by

d ∈ Rn : Ad = 0,d ≥ 0.

Extreme rays

Definition

1 A nonzero element x of a polyhedral cone C is called an extremeray if there are n−1 linearly independent constraints that are activeat x.

2 An extreme ray of the recession cone associated with a nonemptypolyhedron P is also called an extreme ray of P.

RemarkNote that any positive scalar multiple of an extreme ray is also anextreme ray. We say two extreme rays are equivalent if one is a positivemultiple of the other. For this to happen, they must correspond to thesame n−1 linearly independent active constraints. This can result in atmost two nonequivalent extreme rays (d ,−d). Thus the number ofnonequivalent extreme rays of a polyhedron is finite. When we refer to acomplete set of extreme rays of a polyhedron, we are referring to a finitecollection of extreme rays of P containing exactly one representative fromeach equivalence class.

Extreme rays

Definition

1 A nonzero element x of a polyhedral cone C is called an extremeray if there are n−1 linearly independent constraints that are activeat x.

2 An extreme ray of the recession cone associated with a nonemptypolyhedron P is also called an extreme ray of P.

RemarkNote that any positive scalar multiple of an extreme ray is also anextreme ray. We say two extreme rays are equivalent if one is a positivemultiple of the other. For this to happen, they must correspond to thesame n−1 linearly independent active constraints. This can result in atmost two nonequivalent extreme rays (d ,−d). Thus the number ofnonequivalent extreme rays of a polyhedron is finite. When we refer to acomplete set of extreme rays of a polyhedron, we are referring to a finitecollection of extreme rays of P containing exactly one representative fromeach equivalence class.

Characterizing unbounded LPs

TheoremConsider the problem of minimizing c ′x over a pointed polyhedralcone C = x ∈ Rn : a′ix ≥ 0, i = 1, . . . ,m. The linear program isunbounded if and only if some extreme ray d of C satisfies c ′d < 0.

TheoremConsider the problem of minimizing c ′x subject to Ax ≥ b, andassume that the feasible set has at least one extreme point. Thelinear program is unbounded if and only if some extreme ray d ofthe feasible set satisfies c ′d < 0.

Characterizing unbounded LPs

TheoremConsider the problem of minimizing c ′x over a pointed polyhedralcone C = x ∈ Rn : a′ix ≥ 0, i = 1, . . . ,m. The linear program isunbounded if and only if some extreme ray d of C satisfies c ′d < 0.

TheoremConsider the problem of minimizing c ′x subject to Ax ≥ b, andassume that the feasible set has at least one extreme point. Thelinear program is unbounded if and only if some extreme ray d ofthe feasible set satisfies c ′d < 0.






Resolution theoremTheoremLet P = x ∈ Rn : Ax ≥ b be a nonempty polyhedron with at least oneextreme point. Let x1, . . . ,xk be the extreme points, and let w 1, . . . ,w r

be a complete set of extreme rays of P. Let

Q = k

∑i=1

λixi +

r

∑j=1

θjwj |λi ≥ 0,θj ≥ 0,

k

∑i=1

λi = 1.

Then, Q = P.

CorollaryA nonempty bounded polyhedron is the convex hull of its extreme points.

CorollaryAssume cone C = x : Ax ≥ 0 is pointed. Then, every element of C canbe expressed as a nonnegative linear combination of extreme rays of C .

TheoremA finitely generated set is a polyhedron. In particular, the convex hull offinitely many vectors is a bounded polyhedron.



Q = k

∑i=1

λixi +

r

∑j=1


k

∑i=1

λi = 1.

Then, Q = P.






Q = k

∑i=1

λixi +

r

∑j=1


k

∑i=1

λi = 1.

Then, Q = P.






Q = k

∑i=1

λixi +

r

∑j=1


k

∑i=1

λi = 1.

Then, Q = P.









Sensitivity analysis

• Goal is to study the impact of minor variations in the input toan LP that has already been solved to optimality.

• This is not a satisfactory approach to handle uncertainty indata, or to obtain robust solutions. Stochastic programmingand robust optimization approaches aim to formally addressthe issue of uncertainty.

• Sensitivity analysis is an excellent tool to understand theoptimality principles of LP.

• Traditional sensitivity reports document tolerance ranges foreach cj and bi for the current basis to remain optimal, whenthe elements of c or b vary one-at-a-time, while the others areheld fixed. These results need not hold when two or moreparameters vary simultaneously.

Assume a standard form LP with linearly independent constraints.The optimal tableau (primal or dual simplex) will present anoptimal basis. That is, B−1b ≥ 0 and c ′ = c ′− c ′BB−1A≥ 0′.


B−1b B−1A1 · · · B−1An

Maintaining the optimality conditions under variations gives rise tothe techniques employed in sensitivity analysis. The types ofvariations of interest to us are changes in elements of A,b,c ,addition of a new constraint or a new variable. We look forconditions under which the current basis remains optimal, andwhen the conditions are violated, what re-optimization procedurecan be used.

Adding a new variable

Suppose a new variable xn+1 is added to result in the new problem

minc ′x + cn+1xn+1 : Ax + An+1xn+1 = b,x ≥ 0,xn+1 ≥ 0.

With xn+1 as nonbasic, current basis will be optimal to the newproblem if B−1b ≥ 0 (satisfied), c ≥ 0 (satisfied) andcn+1− c ′BB ∈ An+1 ≥ 0 (needs to be verified). If the last conditionis met, (x∗,0) is optimal to the new problem, otherwise we enterAn+1 into the basis to start primal simplex iterations. Typically,starting from the previously optimal B leads to the new optimalsolution in fewer iterations than optimizing the new problem fromscratch.









Adding a new inequality constraint

Suppose a new inequality constraint a′m+1x ≥ bm+1 is added to theproblem. If the optimal solution x∗ satisfies this constraint, it is optimalto the new problem as well. If not, we introduce a new slack variable toobtain a problem in standard form

minc ′x + 0xn+1 : Ax +0xn+1 = b,a′m+1x−xn+1 = bm+1,x ≥ 0,xn+1 ≥ 0.

If B was the optimal basis, form a new basis by selecting xn+1 to bebasic. So,

B =

[B 0a′ −1

],

where a′ contains components of a′m+1 corresponding to the m basicvariables.


Suppose a new inequality constraint a′m+1x ≥ bm+1 is added to theproblem. If the optimal solution x∗ satisfies this constraint, it is optimalto the new problem as well. If not, we introduce a new slack variable toobtain a problem in standard form

minc ′x + 0xn+1 : Ax +0xn+1 = b,a′m+1x−xn+1 = bm+1,x ≥ 0,xn+1 ≥ 0.

If B was the optimal basis, form a new basis by selecting xn+1 to bebasic. So,

B =

[B 0a′ −1

],

where a′ contains components of a′m+1 corresponding to the m basicvariables.


The associated basic solution is (x∗,a′m+1x∗−bm+1) and the new inverseis

B−1 =

[B−1 0

a′B−1 −1

].

The new reduced costs are then given by,

[c ′ 0]− [c ′B 0]B−1

[A 0

a′m+1 −1

]= [c 0],

which continues to be nonnegative. Hence, the basis B is primalinfeasible (xn+1 < 0) and dual feasible permitting us to carry out dualsimplex method to reoptimize. Constructing the new simplex tableaurequires

B−1

[A 0

a′m+1 −1

]=

[B−1A 0

a′B−1A−a′m+1 1

].


The associated basic solution is (x∗,a′m+1x∗−bm+1) and the new inverseis

B−1 =

[B−1 0

a′B−1 −1

].

The new reduced costs are then given by,

[c ′ 0]− [c ′B 0]B−1

[A 0

a′m+1 −1

]= [c 0],

which continues to be nonnegative. Hence, the basis B is primalinfeasible (xn+1 < 0) and dual feasible permitting us to carry out dualsimplex method to reoptimize. Constructing the new simplex tableaurequires

B−1

[A 0

a′m+1 −1

]=

[B−1A 0

a′B−1A−a′m+1 1

].

Adding a new equality constraint

Suppose a new equality constraint a′m+1x = bm+1 is added to theproblem. Assume the optimal solution x∗ to the original problem violatesthis constraint. Assume w.l.o.g, that a′m+1x∗ > bm+1. Introduce theauxiliary primal problem

minc ′x + Mxn+1 : Ax = b,a′m+1x−xn+1 = bm+1,x ≥ 0,xn+1 ≥ 0,

where M is a large positive constant.A primal feasible solution is obtained by picking the basic variables forthe optimal solution and xn+1. The basis B can be constructed as before.In this case however, the basis is primal feasible but not dual feasible,requiring us to reoptimize by primal simplex method.

Adding a new equality constraint

Suppose a new equality constraint a′m+1x = bm+1 is added to theproblem. Assume the optimal solution x∗ to the original problem violatesthis constraint. Assume w.l.o.g, that a′m+1x∗ > bm+1. Introduce theauxiliary primal problem

minc ′x + Mxn+1 : Ax = b,a′m+1x−xn+1 = bm+1,x ≥ 0,xn+1 ≥ 0,

where M is a large positive constant.A primal feasible solution is obtained by picking the basic variables forthe optimal solution and xn+1. The basis B can be constructed as before.In this case however, the basis is primal feasible but not dual feasible,requiring us to reoptimize by primal simplex method.

Changes in the requirement vector b

Suppose some component bi is changed to bi + δ , the new RHS isb + δ ei . The reduced costs are unaffected by this change and weonly need to check primal feasibility. The current basis will remainoptimal if B−1(b + δ ei )≥ 0.If the i-th column of B−1 is g = (β1i ,β2i , . . . ,βmi ), this amounts tochecking xB + δ g ≥ 0. Equivalently,

maxj |βji>0

−xB(j)

βji≤ δ ≤ min

j |βji<0

−xB(j)

βji.

For δ outside this range, the current basis is primal infeasible anddual feasible. This can be reoptimized by applying dual simplexmethod.



maxj |βji>0

−xB(j)

βji≤ δ ≤ min

j |βji<0

−xB(j)

βji.




maxj |βji>0

−xB(j)

βji≤ δ ≤ min

j |βji<0

−xB(j)

βji.


Changes in the cost vector c

Suppose some cost coefficient becomes cj + δ . The primalfeasibility is unaffected, only the reduced costs, c ′ = c ′− c ′BB−1Acould get affected.If j is nonbasic, cB is unaffected and we only needcj + δ − c ′BB−1Aj ≥ 0 or δ ≥−cj .If j is the k-th basic variable, that is B(k) = j , then cB + δ ek is thenew basic objective coefficient vector. Hence, all reduced costs canget affected. We need,

(cB + δ ek)′B−1Ai ≤ ci , ∀ i 6= j .

Let q vector be the k-th row of B−1A. Then we needδ qki ≤ ci ∀ i 6= k .



(cB + δ ek)′B−1Ai ≤ ci , ∀ i 6= j .




(cB + δ ek)′B−1Ai ≤ ci , ∀ i 6= j .


Changes in a nonbasic column of A

Suppose some entry aij in the j-th column Aj be changed to aij +δ .If Aj is nonbasic, B does not change and the primal feasibility isunaffected. Only the associated reduced cost changes leading to,

cj −p′(Aj + δ ei )≥ 0,

cj −δ pi ≥ 0.

Changes in a nonbasic column of A

Suppose some entry aij in the j-th column Aj be changed toaij + δ . If Aj is basic, B changes and both primal and dualfeasibility could be affected.It can still be shown that the basis will remain optimal if δ is in aninterval. But this analysis is a lot more tedious and hence won’t bepursued! See Exercise 5.2.






Global dependence on b

ConsiderP(b) = x |Ax = b,x ≥ 0,

S = b|P(b) 6= /0.

Note that S is equivalently described as

S = Ax |x ≥ 0,

in particular, S is a convex set. Consider F : S −→ R defined as,

F (b) = minc ′x |x ∈ P(b),

the optimal cost as a function of b.We further assume that the dual feasible set p|p′A≤ c ′ isnonempty so that F (b) is finite for all b ∈ S .

Convexity of F (b)

TheoremThe optimal cost F (b) is a convex function of b on the set S.

Proof.Let b1,b2 ∈ S , let x i ∈ P(bi ), i = 1,2 be such that F (bi ) = c ′x i . Fixλ ∈ [0,1], and let y = λ x1 + (1−λ )x2. Note thaty ∈ P(λ b1 + (1−λ )b2). Therefore,

F (λ b1 + (1−λ )b2)≤ c ′y = λ F (b1) + (1−λ )F (b2).

RemarkSince A has full row rank, the dual feasible region has at least oneextreme point. The above result is evident from the observation

F (b) = maxi=1,...,N

(pi )′b, ∀ b ∈ S

where p1, . . . ,pN are the dual extreme points.

Convexity of F (b)



F (λ b1 + (1−λ )b2)≤ c ′y = λ F (b1) + (1−λ )F (b2).


F (b) = maxi=1,...,N

(pi )′b, ∀ b ∈ S


Convexity of F (b)



F (λ b1 + (1−λ )b2)≤ c ′y = λ F (b1) + (1−λ )F (b2).


F (b) = maxi=1,...,N

(pi )′b, ∀ b ∈ S


Differentiability of F (b)

• F (b) = maxi=1,...,N(pi )′b is a piecewise linear function, and hence itis differentiable in regions where F is linear (where the maximum isuniquely achieved). In such regions, F (b) = (pi )′b, where pi is thecorresponding dual optimal solution.

• Suppose b is such that there exists a nondegenerate optimal BFS,then let B be the associated optimal basis. Then, xB = B−1b > 0and there exists b close to b so that B−1b > 0 and B is still anoptimal basis. Thus,

F (b) = c ′BB−1b = p′b, ∀ b|B−1b > 0,

and hence, F (b) is linear in the vicinity of b with the gradient of Fat b given by p.

• For those values of b for which F is not differentiable, the dualproblem does not have a unique optimal solution and this impliesevery optimal solution to the primal is degenerate.




F (b) = c ′BB−1b = p′b, ∀ b|B−1b > 0,






F (b) = c ′BB−1b = p′b, ∀ b|B−1b > 0,



Variation along a directionLet b∗ and d be fixed vectors and define b = b∗+ θd . Let

f (θ) = F (b∗+ θd) = maxi=1,...,N

(pi )′(b∗+ θd), ∀ b∗+ θd ∈ S .






Subgradients

DefinitionLet F be a convex function defined on a convex set S. Let b∗ be inS. We say that a vector p is a subgradient of F at b∗ if

F (b)≥ F (b∗) + p′(b−b∗) ∀ b ∈ S .

TheoremSuppose the LP minc ′x |Ax = b∗,x ≥ 0 is feasible and theoptimal cost is finite. Then, a vector p is an optimal solution tothe dual problem if and only if it is a subgradient of the optimalcost function F at the point b∗.

Subgradients

DefinitionLet F be a convex function defined on a convex set S. Let b∗ be inS. We say that a vector p is a subgradient of F at b∗ if

F (b)≥ F (b∗) + p′(b−b∗) ∀ b ∈ S .

TheoremSuppose the LP minc ′x |Ax = b∗,x ≥ 0 is feasible and theoptimal cost is finite. Then, a vector p is an optimal solution tothe dual problem if and only if it is a subgradient of the optimalcost function F at the point b∗.

Subgradients of a PWL convex function






Global dependence on c

• Consider the dual feasible set,

Q(c) = p|p′A≤ c ′, and let T = c |Q(c) 6= /0.

It can be verified that T is a convex set.

• If c /∈ T , dual is infeasible, primal is feasible by assumption, andhence, primal must be unbounded. If c ∈ T , primal and dual arefeasible, and hence, both have finite optimal costs. Thus, theoptimal primal cost is finite if and only if c ∈ T .

G (c) = minc ′x : Ax = b,x ≥ 0 ∀ c ∈ T = mini=1,...,N

c ′x i ,

where x1, . . . ,xN are the extreme points of P = x |Ax = b,x ≥ 0.• G (c) is a PWL concave function. Fix c∗ and let G (c∗) be uniquely

achieved by x i . There is a neighborhood of c∗ from which we canpick c such that G (c) is uniquely achieved by the same x i . Locally,G (c) = c ′x i . At c leading to multiple primal optimal solutions, Ghas a break point.







c ′x i ,









c ′x i ,




TheoremConsider a feasible linear programming problem in standard form.

1 The set T of all c for which the optimal cost is finite, isconvex.

2 The optimal cost function G (c) is a concave function of c onthe set T .

3 If for some value of c the primal problem has a unique optimalsolution x∗, then G is linear in the vicinity of c and itsgradient is equal to x∗.






Parametric programming

For a fixed A,b,c and vector d ∈ Rn, the goal of parametricprogramming is to solve the following problem

g(θ) = min(c + θd)′x

Ax = b

x ≥ 0

We assume that the feasible region is nonempty. Then for θ suchthat g(θ) is finite, we have:

g(θ) = mini=1,...,N

(c + θd)′x i ,

where x1, . . . ,xN are the extreme points of the feasible set. g(θ) isPWL and concave in θ .

Parametric programming

For a fixed A,b,c and vector d ∈ Rn, the goal of parametricprogramming is to solve the following problem

g(θ) = min(c + θd)′x

Ax = b

x ≥ 0

We assume that the feasible region is nonempty. Then for θ suchthat g(θ) is finite, we have:

g(θ) = mini=1,...,N

(c + θd)′x i ,

where x1, . . . ,xN are the extreme points of the feasible set. g(θ) isPWL and concave in θ .

Illustration of g(θ )

Example

g(θ) = min(−3 + 2θ)x1 + (3−θ)x2 + x3

x1 + 2x2−3x3 ≤ 5

2x1 + x2−4x3 ≤ 7

x1,x2,x3 ≥ 0

x1 x2 x3 x4 x5

0 −3 + 2θ 3−θ 1 0 0x4 = 5 1 2 -3 1 0x5 = 7 2 1 -4 0 1g(θ) = 0,1.5≤ θ ≤ 3. If θ > 3, x2 enters

x1 x2 x3 x4 x5

-7.5+2.5θ −4.5 + 2.5θ 0 5.5−1.5θ −1.5 + 0.5θ 0x2 = 2.5 0.5 1 -1.5 0.5 0x5 = 4.5 1.5 0 -2.5 -0.5 1

g(θ) = 7.5−2.5θ ,3≤ θ ≤ 5.51.5 . If θ > 5.5

1.5 , x3 enters. No positive pivotelement means, g(θ) =−∞,θ > 5.5

1.5 .

Example

g(θ) = min(−3 + 2θ)x1 + (3−θ)x2 + x3

x1 + 2x2−3x3 ≤ 5

2x1 + x2−4x3 ≤ 7

x1,x2,x3 ≥ 0

x1 x2 x3 x4 x5


x1 x2 x3 x4 x5

-7.5+2.5θ −4.5 + 2.5θ 0 5.5−1.5θ −1.5 + 0.5θ 0x2 = 2.5 0.5 1 -1.5 0.5 0x5 = 4.5 1.5 0 -2.5 -0.5 1

g(θ) = 7.5−2.5θ ,3≤ θ ≤ 5.51.5 . If θ > 5.5


1.5 .

Example

g(θ) = min(−3 + 2θ)x1 + (3−θ)x2 + x3

x1 + 2x2−3x3 ≤ 5

2x1 + x2−4x3 ≤ 7

x1,x2,x3 ≥ 0

x1 x2 x3 x4 x5


x1 x2 x3 x4 x5

-7.5+2.5θ −4.5 + 2.5θ 0 5.5−1.5θ −1.5 + 0.5θ 0x2 = 2.5 0.5 1 -1.5 0.5 0x5 = 4.5 1.5 0 -2.5 -0.5 1

g(θ) = 7.5−2.5θ ,3≤ θ ≤ 5.51.5 . If θ > 5.5


1.5 .

Example

g(θ) = min(−3 + 2θ)x1 + (3−θ)x2 + x3

x1 + 2x2−3x3 ≤ 5

2x1 + x2−4x3 ≤ 7

x1,x2,x3 ≥ 0

x1 x2 x3 x4 x5


x1 x2 x3 x4 x5

-7.5+2.5θ −4.5 + 2.5θ 0 5.5−1.5θ −1.5 + 0.5θ 0x2 = 2.5 0.5 1 -1.5 0.5 0x5 = 4.5 1.5 0 -2.5 -0.5 1

g(θ) = 7.5−2.5θ ,3≤ θ ≤ 5.51.5 . If θ > 5.5


1.5 .

Example

g(θ) = min(−3 + 2θ)x1 + (3−θ)x2 + x3

x1 + 2x2−3x3 ≤ 5

2x1 + x2−4x3 ≤ 7

x1,x2,x3 ≥ 0

x1 x2 x3 x4 x5


x1 x2 x3 x4 x5

-7.5+2.5θ −4.5 + 2.5θ 0 5.5−1.5θ −1.5 + 0.5θ 0x2 = 2.5 0.5 1 -1.5 0.5 0x5 = 4.5 1.5 0 -2.5 -0.5 1

g(θ) = 7.5−2.5θ ,3≤ θ ≤ 5.51.5 . If θ > 5.5


1.5 .

Example -contd.

Now we go back to the original tableau and consider θ < 1.5.

x1 x2 x3 x4 x5

0 −3 + 2θ 3−θ 1 0 0

x4 = 5 1 2 -3 1 0x5 = 7 2 1 -4 0 1

g(θ) = 0,1.5≤ θ ≤ 3, if θ < 1.5, x1 enters.

x1 x2 x3 x4 x5

10.5 -7θ 0 4.5−2θ −5 + 4θ 0 1.5−θ

x4 = 1.5 0 1.5 -1 1 -0.5x1 = 3.5 1 0.5 -2 0 0.5

g(θ) =−10.5 + 7θ , 54 ≤ θ ≤ 3

2 . If θ < 54 , x3 enters. No positive

pivot element means, g(θ) =−∞,θ < 54 .

Example -contd.


x1 x2 x3 x4 x5

0 −3 + 2θ 3−θ 1 0 0

x4 = 5 1 2 -3 1 0x5 = 7 2 1 -4 0 1

g(θ) = 0,1.5≤ θ ≤ 3, if θ < 1.5, x1 enters.

x1 x2 x3 x4 x5

10.5 -7θ 0 4.5−2θ −5 + 4θ 0 1.5−θ

x4 = 1.5 0 1.5 -1 1 -0.5x1 = 3.5 1 0.5 -2 0 0.5

g(θ) =−10.5 + 7θ , 54 ≤ θ ≤ 3



Example -contd.


x1 x2 x3 x4 x5

0 −3 + 2θ 3−θ 1 0 0

x4 = 5 1 2 -3 1 0x5 = 7 2 1 -4 0 1

g(θ) = 0,1.5≤ θ ≤ 3, if θ < 1.5, x1 enters.

x1 x2 x3 x4 x5

10.5 -7θ 0 4.5−2θ −5 + 4θ 0 1.5−θ

x4 = 1.5 0 1.5 -1 1 -0.5x1 = 3.5 1 0.5 -2 0 0.5

g(θ) =−10.5 + 7θ , 54 ≤ θ ≤ 3



Example -contd.


x1 x2 x3 x4 x5

0 −3 + 2θ 3−θ 1 0 0

x4 = 5 1 2 -3 1 0x5 = 7 2 1 -4 0 1

g(θ) = 0,1.5≤ θ ≤ 3, if θ < 1.5, x1 enters.

x1 x2 x3 x4 x5

10.5 -7θ 0 4.5−2θ −5 + 4θ 0 1.5−θ

x4 = 1.5 0 1.5 -1 1 -0.5x1 = 3.5 1 0.5 -2 0 0.5

g(θ) =−10.5 + 7θ , 54 ≤ θ ≤ 3



g(θ) =

−∞, θ < 54

−10.5 + 7θ , 54 ≤ θ ≤ 3

2

0, 32 ≤ θ ≤ 3

7.5−2.5θ 3≤ θ ≤ 5.51.5

−∞, θ > 5.51.5

Obtaining g(θ )

• Given a basis, the set of θ for which this basis is optimal is a closedinterval, say θ1 ≤ θ ≤ θ2.

• Suppose cj becomes negative for θ > θ2 and it is 0 for θ = θ2.Consider the cases if xj is entered.

• If u = B−1Aj ≤ 0, then g(θ) =−∞,θ > θ2. Otherwise, carry out achange of basis with θ = θ2. The new basis will remain optimal forθ = θ2 and it cannot be optimal for θ < θ2. The range of values forwhich the new basis is optimal would be of the form θ2 ≤ θ ≤ θ3,for some θ3.

• Similarly, we can obtain a sequence of bases, with the i-th basisbeing optimal for [θi ,θi+1]. Note that the i-th basis cannot beoptimal for θ > θi+1. If θi+1 > θi for all i , we will not encounter thesame optimal basis twice.

• If θi = θi+1 there is possibility that we may cycle through bases thatare optimal for θ = θi = θi+1. This can only happen in the presenceof degeneracy in the primal problem.

Obtaining g(θ )






Obtaining g(θ )






Obtaining g(θ )






Obtaining g(θ )











MotivationConsider the standard form problem,

minc ′x

s.t. Ax = b

x ≥ 0

under the usual assumption of linearly independent rows. Suppose n istoo large (think exponential in m) that it is impossible to generate andstore the entire A matrix. What now?Recall that the memory requirements of the revised simplex method isO(m2) – independent of n. Given the current basis (inverse), and theentering column, the new B−1 is computed by ERO.If we can somehow intelligently select the entering variable withoutexplicitly constructing all the columns of A and computing all thereduced costs, we can still carry out revised simplex iterations withoutever having to “construct” the entire formulation.What we need is the answer to:

mini=1,...,n

ci


minc ′x

s.t. Ax = b

x ≥ 0


mini=1,...,n

ci


minc ′x

s.t. Ax = b

x ≥ 0


mini=1,...,n

ci


minc ′x

s.t. Ax = b

x ≥ 0


mini=1,...,n

ci

A basic column generation schemeThe LP will be solved by a sequence of master iterations. A masteriteration consists of solving the restricted master problem (RMP) and thecolumn generating subproblem (CGSP).

(RMP) min∑i∈I

cixi

s.t. ∑i∈I

Aixi = b

x ≥ 0

In the first master iteration, I is initialized to a set columns that form theinitial BFS for the RMP, and possibly other columns that we may thinkare “important” in the problem context. We solve the RMP to optimality.We then search for a variable xj with cj < 0 by solving

(CGSP) mini

ci ,

if none is found, then BFS optimal to RMP is also optimal to the originalLP, and we terminate. Otherwise, we have some j with cj < 0 and thecolumn Aj ; we add j to I . Start the next master iteration to solve RMPfor the new I .

A basic column generation schemeThe LP will be solved by a sequence of master iterations. A masteriteration consists of solving the restricted master problem (RMP) and thecolumn generating subproblem (CGSP).

(RMP) min∑i∈I

cixi

s.t. ∑i∈I

Aixi = b

x ≥ 0

In the first master iteration, I is initialized to a set columns that form theinitial BFS for the RMP, and possibly other columns that we may thinkare “important” in the problem context. We solve the RMP to optimality.We then search for a variable xj with cj < 0 by solving

(CGSP) mini

ci ,

if none is found, then BFS optimal to RMP is also optimal to the originalLP, and we terminate. Otherwise, we have some j with cj < 0 and thecolumn Aj ; we add j to I . Start the next master iteration to solve RMPfor the new I .

Remarks on the basic scheme

• In the basic scheme we have simply modified the classical Dantzig’s variableselection rule, to prioritize variables in I . That is, we first choose enteringcolumns by the most negative reduced cost of variables in I . Once they are allnonnegative (which happens when RMP is optimal), we seek variables outside I(by solving CGSP).

• Assume that original LP has an optimal solution. If the RMP is not solved tooptimality in some master iteration, but CGSP is solved to optimality at everyiteration (with i varying over all 1, . . . ,n), we are still guaranteed that we havean optimal solution when the master iterations terminate. Since, at termination,you meet the sufficient condition for optimality.

• However, if we do not solve CGSP to optimality and employ a heuristicapproach (why?), we cannot guarantee optimality of the original LP attermination. If the heuristic solution to CGSP identifies, an entering column, weenter it and proceed. If the heuristic fails to identify an entering column, it doesnot mean none exists!

• You could terminate the RMP solution process early, if it is likely that RMP hasdone too many iterations and you suspect there are columns outside the RMPthat need to get in to form an optimal basis eventually.
















Variations on the basic scheme

• In the basic scheme, I is initialized to to just the basic columns, orsome with some extra columns. With each solution of CGSP, weadd a column, but never drop any. Note that I then grows witheach iteration, and can get really huge if it takes many masteriterations to solve the problem. That can end up defeating ouroriginal purpose of memory conservation!

• The other extreme, is I is initialized to the initial basic columns; andthe entering column found by CGSP, replaces the a currently basiccolumn in I . That means, solving the RMP takes one iteration as itwill have the m columns and the new entering column Aj . This islight on memory requirements, and quick in solving the RMP. Butthe burden shifts to CGSP. For instance, if a previously basic columnhas to reenter we would now be solving the CGSP to find that outinstead of simply computing it using the reduced cost formula.

• An intermediate option is to drop columns from I if they haveremained nonbasic for more than a threshold number of iterations.









Some final comments

• All the variations identified on the previous slide are guaranteed toterminate in the absence of degeneracy. In the presence ofdegeneracy, cycling can be avoided by using the lexicographic rule.

• It helps to identify a “pool of columns” in advance that we “think”would contribute to the optimal basis. This would typically happen,if the instance you are solving today was solved recently and the“past” optimal solution is known. But this may no longer beoptimal as some of the data is now different, but the correspondingcolumns are prime candidates to go into the “pool.”

• Our ability to solve the CGSP effectively is key to solving a problemusing column generation. In the absence of a meaningful approachto solving CGSP, we cannot use this framework. The explicit formof CGSP depends on the original problem context. We will see twocases. First, the cutting stock problem, whose CGSP is a knapsackproblem. Second, the classical Dantzig-Wolfe decomposition of ablock diagonal form, where the CGSP is another LP.

Some final comments




Some final comments









The cutting stock problem

• Given stock rolls of width W , determine how to cut thesestock rolls so that bi units of length wi for i = 1, . . . ,m areproduced and the total number of stock rolls used isminimized.

• A numerical e.g. W = 20, 150 units of length 5, 200 units oflength 7, 300 units of length 9 are needed. That is,w1 = 5,w2 = 7,w3 = 9, b1 = 150,b2 = 200,b3 = 300, m = 3demand types are present in this case.

• Since, the stock could be cut according any pattern, in orderto formulate the problem, we need to enumerate all the cutpatterns (knife settings). A pattern is represented by a columnvector Aj with its i-th entry aij is the number of units of type iare produced if one stock roll is cut according to this pattern.









Cutting patternsFor e.g., the first pattern in the figure is represented by the column[4,0,0]′ and the second pattern is represented by the column[2,0,1]′.Note that the second pattern has a wastage of 1 unit width roll.

5 ft 5 ft 5 ft 5 ft

5 ft 5 ft 9 ft 1 ft

Note that Aj = [a1j , . . . ,amj ]′ is an admissible knife setting if

aij , i = 1, . . . ,m are all nonnegative integers and

m

∑i=1

aijwi ≤W .

Formulation

Suppose there are j = 1, . . . ,n possible patterns according to whicha stock roll could be cut. Let these columns populate the m×nmatrix A. With xj denoting the number of stocks cut according topattern j , the cutting stock problem can be formulated as:

minn

∑j=1

xj

s.t.n

∑j=1

Ajxj = b

xj ≥ 0, j = 1, . . . ,n and integer.

We will ignore the integer restriction on the xj variables, and solvethe problem as a linear program. Once an optimal solution isfound, we can obtain a feasible solution to this integer program byrounding up which will not use more than zIP + m rolls where zIP isthe optimal IP solution.

CGSP for the cutting stock problemInitializing the RMP for the cutting stock problem can be easilyaccomplished with I corresponding to an identity matrix. Suppose wehave optimized the RMP in some iteration, and B is the associatedoptimal basis. We wish to identify a new column to enter.

• Compute p′ = c ′BB−1, note that cB is a vector of 1s.

• Consider cj = 1−p′Aj ; minimizing cj over all j is equivalent tomaximizing p′Aj over all j .

• If this maximum is 1, then B is optimal to the original LP; if thismaximum is larger than 1, then the maximizing column Aj entersthe basis.

• We use the condition ∑mi=1 aijwi ≤W to optimize over admissible

columns/patterns while maximizing p′Aj

maxm

∑i=1

aipi

s.t.m

∑i=1

wiai ≤ W

ai ≥ 0 and integer, i = 1, . . . ,m

CGSP for the cutting stock problemInitializing the RMP for the cutting stock problem can be easilyaccomplished with I corresponding to an identity matrix. Suppose wehave optimized the RMP in some iteration, and B is the associatedoptimal basis. We wish to identify a new column to enter.

• Compute p′ = c ′BB−1, note that cB is a vector of 1s.

• Consider cj = 1−p′Aj ; minimizing cj over all j is equivalent tomaximizing p′Aj over all j .

• If this maximum is 1, then B is optimal to the original LP; if thismaximum is larger than 1, then the maximizing column Aj entersthe basis.

• We use the condition ∑mi=1 aijwi ≤W to optimize over admissible

columns/patterns while maximizing p′Aj

maxm

∑i=1

aipi

s.t.m

∑i=1

wiai ≤ W

ai ≥ 0 and integer, i = 1, . . . ,m






Cutting plane methods

• Consider the dual to the standard form LP given bymaxp′b|p′Ai ≤ ci , i = 1, . . . ,n. If the standard form LP had a largenumber of variables, then this problem has a large number ofconstraints.

• Consider the relaxed problem (RP): maxp′b|p′Ai ≤ ci , i ∈ I , whereI ⊆ 1, . . . ,n. Note RP is dual to the RMP.

• If we solved the relaxed problem to optimality, achieved at p∗ and ifwe could some how verify that the constraints not considered in RPare not violated, we can conclude p∗ is optimal to the original LP.

• Separation problem (SP): minci − (p∗)′Ai ∀ i . If this optimal cost isnonnegative, then we have verified that p∗ does not violate any ofthe constraints that have been ignored in the RP. If the optimal costis negative, we have a constraint violated by p∗ and we add thisconstraint to RP and re-solve. The SP is identical to the CGSP.

• It should be apparent that column generation approach to theprimal LP is equivalent to the cutting plane approach to its dual. Asbefore, successful application of this technique depends on ourability to efficiently solve the separation problem.






























Dantzig-Wolfe decomposition

Consider the LP,

minc ′1x1 + c ′2x2

s.t. D1x1 + D2x2 = b0

F 1x1 = b1

F 2x2 = b2

x1 ≥ 0,x1 ∈ Rn1

x2 ≥ 0,x2 ∈ Rn2

Denote, Pi = x i ≥ 0|F ix i = bi, i = 1,2. Assume Pi are nonempty, letx ij , j ∈ Ji are the extreme points of Pi and w ik ,k ∈ Ki is a complete set ofnonequivalent extreme rays of Pi . By the resolution theorem, and x i ∈ Pi

can be written as:x i = ∑

j∈Ji

λijx ij + ∑

k∈Ki

θikw ik ,

where ∑j∈Jiλ ij = 1, i = 1,2 and λ ij ,θ ik ≥ 0, ∀ i , j ,k .

Dantzig-Wolfe decomposition

Consider the LP,

minc ′1x1 + c ′2x2

s.t. D1x1 + D2x2 = b0

F 1x1 = b1

F 2x2 = b2

x1 ≥ 0,x1 ∈ Rn1

x2 ≥ 0,x2 ∈ Rn2

Denote, Pi = x i ≥ 0|F ix i = bi, i = 1,2. Assume Pi are nonempty, letx ij , j ∈ Ji are the extreme points of Pi and w ik ,k ∈ Ki is a complete set ofnonequivalent extreme rays of Pi . By the resolution theorem, and x i ∈ Pi

can be written as:x i = ∑

j∈Ji

λijx ij + ∑

k∈Ki

θikw ik ,

where ∑j∈Jiλ ij = 1, i = 1,2 and λ ij ,θ ik ≥ 0, ∀ i , j ,k .

Dantzig-Wolfe Reformulation

minc ′1x1 + c ′2x2

s.t. D1x1 + D2x2 = b0

x i ∈ P i , i = 1,2

Recall that, x i = ∑j∈Ji

λijx ij + ∑

k∈Ki

θikw ik

(DWR) min ∑j∈J1

λ1j(c ′1x1j)+ ∑

k∈K1

θ1k(c ′1w 1k)+ ∑

j∈J2

λ2j(c ′2x2j)+ ∑

k∈K2

θ2k(c ′2w 2k)

subject to

∑j∈J1

λ1j(D1x1j)+ ∑

k∈K1

θ1k(D1w 1k)+ ∑

j∈J2

λ2j(D2x2j)+ ∑

k∈K2

θ2k(D2w 2k) = b0

∑j∈J1

λ1j = 1

∑j∈J2

λ2j = 1

λij ,θ ik ≥ 0, ∀ i , j ,k


minc ′1x1 + c ′2x2

s.t. D1x1 + D2x2 = b0

x i ∈ P i , i = 1,2


λijx ij + ∑

k∈Ki

θikw ik

(DWR) min ∑j∈J1

λ1j(c ′1x1j)+ ∑

k∈K1

θ1k(c ′1w 1k)+ ∑

j∈J2

λ2j(c ′2x2j)+ ∑

k∈K2

θ2k(c ′2w 2k)

subject to

∑j∈J1

λ1j(D1x1j)+ ∑

k∈K1

θ1k(D1w 1k)+ ∑

j∈J2

λ2j(D2x2j)+ ∑

k∈K2

θ2k(D2w 2k) = b0

∑j∈J1

λ1j = 1

∑j∈J2

λ2j = 1

λij ,θ ik ≥ 0, ∀ i , j ,k


minc ′1x1 + c ′2x2

s.t. D1x1 + D2x2 = b0

x i ∈ P i , i = 1,2


λijx ij + ∑

k∈Ki

θikw ik

(DWR) min ∑j∈J1

λ1j(c ′1x1j)+ ∑

k∈K1

θ1k(c ′1w 1k)+ ∑

j∈J2

λ2j(c ′2x2j)+ ∑

k∈K2

θ2k(c ′2w 2k)

subject to

∑j∈J1

λ1j(D1x1j)+ ∑

k∈K1

θ1k(D1w 1k)+ ∑

j∈J2

λ2j(D2x2j)+ ∑

k∈K2

θ2k(D2w 2k) = b0

∑j∈J1

λ1j = 1

∑j∈J2

λ2j = 1

λij ,θ ik ≥ 0, ∀ i , j ,k

Column generation approach

• The reformulated LP has m0 + 2 constraints and|J1|+ |J2|+ |K1|+ |K2| variables, which could be extremely large.Assuming an initial feasible basis B to be available for the DWR,denote the dual variables by p′ = [q′, r1, r2] = c ′BB−1.

• The reduced cost of λ 1j is given by

(c ′1x1j)− [q′, r1, r2]

D1x1j

10

= (c ′1−q′D1)x1j − r1.

• The reduced cost of θ 1k is given by

(c ′1w 1k)− [q′, r1, r2]

D1w 1k

00

= (c ′1−q′D1)w 1k .

• Similar expressions hold for λ 2j and θ 2k .

• The reduced costs of these variables can be determined by solvingminx i∈Pi

(c ′i −q′Di )x i for i = 1,2. So CGSP here decomposes intotwo linear programming problems.

DW CGSP outcomes

• If the optimal cost in subproblem with i = 1 is unbounded, thenobtain the extreme ray w 1k that satisfies (c ′1−q′D1)w 1k < 0. Then

enter the column for θ 1k given by

D1w 1k

00

.

• If the optimal cost in subproblem with i = 1 is finite and less than r1,then obtain the extreme point x1j that satisfies (c ′1−q′D1)x1j < r1.

Then enter the column for λ 1j given by

D1x1j

10

.

• If the optimal cost in subproblem with i = 1 is finite and at least r1,then solve the subproblem with i = 2, to similarly identify anentering column.

• If the optimal cost in subproblem with i = 2 is also finite and atleast r2, then the basis B is optimal to the DWR of the original LP.

Dantzig-Wolfe Decomposition Algorithm

1 Initialize with m0 + 2 extreme points and extreme rays of P1,P2 thatlead to a BFS for DWR, with associated B,p′ = c ′BB−1 = [q′, r1, r2].

2 Form and solve the two CGSPs. If neither results in an enteringcolumn, B is optimal to DWR. Otherwise identify the enteringvariable and form the entering column.

3 Carry out an iteration of the revised simplex to update B−1, and p;return to solve the CGSPs.

RemarkThe algorithm described above solves the LP

minc ′1x1 + · · ·+ c ′tx t

s.t. D1x1 + · · ·+ Dtx t = b0

F ix i = bi , i = 1, . . . , t

x i ≥ 0, i = 1, . . . , t

for t = 2. It can easily be extended to solve an LP with this blockdiagonal sub-structure for any number of blocks t (including t = 1).






minc ′1x1 + · · ·+ c ′tx t

s.t. D1x1 + · · ·+ Dtx t = b0

F ix i = bi , i = 1, . . . , t

x i ≥ 0, i = 1, . . . , t







minc ′1x1 + · · ·+ c ′tx t

s.t. D1x1 + · · ·+ Dtx t = b0

F ix i = bi , i = 1, . . . , t

x i ≥ 0, i = 1, . . . , t







minc ′1x1 + · · ·+ c ′tx t

s.t. D1x1 + · · ·+ Dtx t = b0

F ix i = bi , i = 1, . . . , t

x i ≥ 0, i = 1, . . . , t


Initial BFS for DWR

• Apply Phase I of the simplex method to each one polyhedra Pi , andfind extreme points x1,1 and x2,1 of P1 and P2, respectively.

• W.l.o.g., assume that D1x1,1 + D2x2,1 ≤ b. Let y be a vector ofauxiliary variables of dimension m0. The auxiliary problem,

minm0

∑t=1

yt

s.t. ∑i=1,2

(∑j∈Ji

(D ix ij)λij + ∑

k∈Ki

(D iw ik)θik)

+ y = b0

∑j∈Ji

λij = 1, i = 1,2, and λ

ij ,θ ik ,yt ≥ 0 ∀ i , j ,k, t

• A BFS to the auxiliary problem is obtained by lettingλ 1,1 = λ 2,1 = 1, λ ij = 0 for j 6= 1, θ ik = 0 ∀ i ,k andy = b0−D1x1,1−D2x2,1.

• Solve this auxiliary problem by the decomposition algorithm. If theoptimal cost is positive, then the DWR master problem is infeasible;If the optimal cost is zero, an optimal solution to the auxiliaryproblem can be used to construct a BFS for DWR master problem.

Bounds on optimal cost

TheoremSuppose the DWR master problem is feasible and its optimal costz∗ is finite. Let z be the cost of the feasible solution obtained atsome intermediate stage of the decomposition algorithm. Also, letri be the value of the dual variable associated with the convexityconstraint for the i-th subproblem. Finally, let zi be the optimalcost in the i-th subproblem assumed finite. Then,

z +∑i

(zi − ri )≤ z∗ ≤ z .






The affine scaling algorithm

• Consider the standard form LP, let P = x |Ax = b,x ≥ 0.We call int(P) = x ∈ P|x > 0 the interior of P and itselements are interior points of P.

• Let x0 ∈ int(P), and let S0 be an ellipsoid centered at x0 thatis contained within int(P).

• It turns out solving minx∈S0 c ′x is easier than solvingminx∈P c ′x . Suppose x1 is the optimal solution to thisproblem, and let S1 be the next ellipsoid centered at x1.Repeat this step.

• We’d like to do this so that eventually we find the optimalsolution to the LP.

Ellipsoids contained in P

LemmaLet β ∈ (0,1) be a scalar, let y ∈ Rn satisfy y > 0, and let

S(y) =

x ∈ Rn|n

∑i=1

(xi −yi )2

y 2i

≤ β2

Then x > 0 for every x ∈ S(y).

• Fix y ∈ int(P), and let Y = diag(y1, . . . ,yn). Thenx ∈ S(y) ⇐⇒‖ Y−1(x−y) ‖≤ β .

• Let S0 = S(y)∩x |Ax = b= x |Ax = b,‖ Y−1(x−y) ‖≤ β, andwe wish to minx∈S0 c ′x . Substitute x−y = d , this problem becomes

minc ′d |Ad = 0,‖ Y−1d ‖≤ β.

• This problem has a closed form solution ...

Ellipsoids contained in P

LemmaLet β ∈ (0,1) be a scalar, let y ∈ Rn satisfy y > 0, and let

S(y) =

x ∈ Rn|n

∑i=1

(xi −yi )2

y 2i

≤ β2

Then x > 0 for every x ∈ S(y).

• Fix y ∈ int(P), and let Y = diag(y1, . . . ,yn). Thenx ∈ S(y) ⇐⇒‖ Y−1(x−y) ‖≤ β .

• Let S0 = S(y)∩x |Ax = b= x |Ax = b,‖ Y−1(x−y) ‖≤ β, andwe wish to minx∈S0 c ′x . Substitute x−y = d , this problem becomes

minc ′d |Ad = 0,‖ Y−1d ‖≤ β.

• This problem has a closed form solution ...

LemmaAssume that rows of A are linearly independent and that c is not a linearcombination of rows of A. Let y be a positive vector. Then, an optimalsolution d∗ to

minc ′d |Ad = 0,‖ Y−1d ‖≤ β

is given by

d∗ =−βY 2(c−A′p)

‖ Y (c−A′p) ‖,

wherep = (AY 2A′)−1AY 2c .

Furthermore, the vector x = y + d∗ belongs to P andc ′x = c ′y −β ‖ (c−A′p) ‖< c ′y .

Remark(1) If d∗ ≥ 0, the feasible set of the LP is unbounded asx + αd∗ > 0 ∀ α > 0 and Ad∗ = 0. Since c ′d∗ < 0 , it means the LP isunbounded. (2) If y is a nondegenerate BFS then the above formulareduces to p = (B−1)′cB .

LemmaAssume that rows of A are linearly independent and that c is not a linearcombination of rows of A. Let y be a positive vector. Then, an optimalsolution d∗ to

minc ′d |Ad = 0,‖ Y−1d ‖≤ β

is given by

d∗ =−βY 2(c−A′p)

‖ Y (c−A′p) ‖,

wherep = (AY 2A′)−1AY 2c .

Furthermore, the vector x = y + d∗ belongs to P andc ′x = c ′y −β ‖ (c−A′p) ‖< c ′y .

Remark(1) If d∗ ≥ 0, the feasible set of the LP is unbounded asx + αd∗ > 0 ∀ α > 0 and Ad∗ = 0. Since c ′d∗ < 0 , it means the LP isunbounded. (2) If y is a nondegenerate BFS then the above formulareduces to p = (B−1)′cB .

LemmaLet y and p be primal and dual feasible, respectively, such thatc ′y −b′p < ε. Let y∗ and p∗ be optimal primal and dual solutionsrespectively. Then,

c ′y∗ ≤ c ′y < c ′y∗+ ε,

b′p∗− ε < b′p ≤ b′p∗.

RemarkBounded duality gap suggests a termination criterion to stop whenwe reach ε-optimal primal and dual solutions. The affine scalingalgorithm works on the following inputs: (a) the data A,b,c , A haslinearly independent rows; (b) an initial primal feasible interiorpoint solution x0 > 0; (c) optimality tolerance ε > 0; (d) theparameter β ∈ (0,1).

LemmaLet y and p be primal and dual feasible, respectively, such thatc ′y −b′p < ε. Let y∗ and p∗ be optimal primal and dual solutionsrespectively. Then,

c ′y∗ ≤ c ′y < c ′y∗+ ε,

b′p∗− ε < b′p ≤ b′p∗.

RemarkBounded duality gap suggests a termination criterion to stop whenwe reach ε-optimal primal and dual solutions. The affine scalingalgorithm works on the following inputs: (a) the data A,b,c , A haslinearly independent rows; (b) an initial primal feasible interiorpoint solution x0 > 0; (c) optimality tolerance ε > 0; (d) theparameter β ∈ (0,1).

The affine scaling algorithm

1 Initialization: x0 > 0, k = 0

2 Computation of dual estimates and reduced cost estimates: let,

X k = diag(xk1 , . . . ,xk

n ),

pk = (AX kX kA′)−1AX kX kc ,

rk = c−A′pk .

3 Optimality check: Let e = (1,1, . . . ,1). If rk ≥ 0 and e ′X k rk < ε,then stop; xk ,pk are ε-optimal.

4 Unboundedness check: If −X kX k rk ≥ 0 then stop; the optimal costis unbounded.

5 Next iterate: Let

x (k+1) = xk −βX kX k rk

‖ X k rk ‖

Documents

IEM5033 Slides