Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Ray Pörn, principal lecturerÅbo Akademi University
Novia University of Applied Sciences
Semidefinite Programming – Basics and Applications
Content
What is semidefinite programming (SDP)?
How to represent different constraints
Representability
Relaxation techniques
Reformulation strategies
Convex optimization
General form:
where f is a convex function and X is a convex set.
Why is convex optimization important?
Many practical problems can be posed as convex programs
Local optimum = global optimum
Hard non-convex problems can be approximated with convex ones
Efficient (polynomial time) algorithms exist
minimize 𝑓 𝑥subject to 𝑥 ∈ 𝑋
Basic linear algebra and notation
Definition: A symmetric matrix 𝐴 is called positive semidefinite if 𝑥𝑇𝐴𝑥 ≥ 0 for all vectors 𝑥 ∈ 𝑅𝑛.
The cone of positive semidefinite matrices
3D
The cone is a convex set.
This is a convex constraint: 𝑿 ≽ 𝟎
Hierarchy of optimization problems
Optimizationproblem
Convex program”nice”
Non-convex program”not so nice”
Conic program
LP
SDP
SOCP
QPLP
convexification
approximation
SDP
SOCP
QCQP
QP
Intro to semidefinite programming
SDP semidefinite program
SOCP second order conic program
QCQP convex quadratically constrained
quadratic program
QP convex quadratic program
LP linear program
(linear) SDP
Minimize a linear function over the intersection of an
affine set and the cone of positive semidefinite matrices
minimize trace 𝐶𝑋subject to trace 𝐴1𝑋 = 𝑏1
⋮trace 𝐴𝑚𝑋 = 𝑏𝑚𝑋 ≽ 0
LP
Intro to SDP
The constant matrices 𝐶, 𝐴1, … , 𝐴𝑚 are assumed to be symmetric.
Different notations:
trace(𝐶𝑋) is the natural inner product 𝐶, 𝑋 in the space of symmetricmatrices. trace 𝐶𝑋 is a linear function of variables 𝑥𝑖𝑗.
Example:
trace 𝐶𝑋 = 𝐓𝐫 𝐶𝑋 = 𝐶, 𝑋 = 𝐶 • 𝑋 =
𝑖=1
𝑛
𝑗=1
𝑛
𝑐𝑖𝑗𝑥𝑖𝑗
Semidefinite programming
Standard form of SDP:
This form is often called the primal problem. It has a matrix variable X, linear equality constraints and one conic constraint (X is psd).
Equivalent form of SDP:
This form is connected to the dual problem. It has a vector variable x, oneLinear Matrix Inequality (LMI) and a set of linear equalities.
minimize trace 𝐶𝑋s. t. trace 𝐴𝑖𝑋 = 𝑎𝑖 𝑖 = 1,… ,𝑚
𝑋 ≽ 0
minimize 𝑐𝑇𝑥s. t. 𝐵𝑥 = 𝑏
𝐵0 + 𝑥1𝐵1 + 𝑥2𝐵2 +⋯𝑥𝑛𝐵𝑛 ≽ 0
Example
SDP with 𝐶 =1 22 3
, 𝐴1 =1 −1−1 2
, 𝐴2 =2 33 0
, 𝑎 =86
and symmetric matrix variable 𝑋 =𝑥1 𝑥2𝑥2 𝑥3
.
trace 𝐶𝑋 = 𝑥1 + 4𝑥2 + 3𝑥3trace 𝐴1𝑋 = 𝑥1 − 2𝑥2 + 2𝑥3trace 𝐴2𝑋 = 2𝑥1 + 6𝑥2
Decompose matrix: 𝑋 =𝑥1 𝑥2𝑥2 𝑥3
= 𝑥11 00 0
+𝑥20 11 0
+𝑥30 00 1
Define:𝑐 = 1 4 3 𝑇
𝑏 = 𝑎
𝐵 =1 −2 22 6 0
minimize 𝑥1 + 4𝑥2 + 3𝑥3s. t. 𝑥1 − 2𝑥2 + 2𝑥3 = 8
2𝑥1 + 6𝑥2 = 6𝑥1 𝑥2𝑥2 𝑥3
≽ 0
minimize 𝑥1 + 4𝑥2 + 3𝑥3s. t. 𝑥1 − 2𝑥2 + 2𝑥3 = 8
2𝑥1 + 6𝑥2 = 6
𝑥11 00 0
+𝑥20 11 0
+𝑥30 00 1
≽ 0
Representability - LP
A set of linear inequality constraints:
ቊ2𝑥 + 3𝑦 ≤ 10−𝑥 + 2𝑦 ≤ 5
⇔ ൜10 − 2𝑥 − 3𝑦 ≥ 05 + 𝑥 − 2𝑦 ≥ 0
⇔10 − 2𝑥 − 3𝑦 0
0 5 + 𝑥 − 2𝑦≽ 0
since a diagonal matrix is PSD iff all diagonal elements are non-negative.
LP as an SDP
Representability – a convex quadratic constraint
A convex quadratic constraint:
Recall:
What about a concave quadratic constraint?
4𝑥2 − 10𝑥 + 2 ≤ 0
𝑎 𝑏𝑏 𝑐
≽ 0 ⟺ 𝑎 ≥ 0 ⋀ 𝑎𝑐 − 𝑏2 ≥ 0
4𝑥2 − 10𝑥 + 2 ≤ 0 ⟺ 1 10𝑥 − 2 − 2𝑥 2 ≥ 0
⟺1 2𝑥2𝑥 10𝑥 − 2
≽ 0
−4𝑥2 − 10𝑥 + 2 ≤ 0
Representability - QP
A general convex quadratic constraint: 𝑥𝑇𝑄𝑥 + 𝑞𝑇𝑥 + 𝑞0 ≤ 0
𝑥𝑇𝑄𝑥 + 𝑞𝑇𝑥 + 𝑞0 ≤ 0
⟺𝐼 𝑅𝑥
𝑅𝑥 𝑇 −𝑞𝑇𝑥 − 𝑞0≽ 0
⟺ 𝑥𝑇𝑅𝑇𝑅𝑥 + 𝑞𝑇𝑥 + 𝑞0 ≤ 0
⟺ 𝑅𝑥 𝑇𝑅𝑥 + 𝑞𝑇𝑥 + 𝑞0 ≤ 0
⟺ −𝑞𝑇𝑥 − 𝑞0 − 𝑅𝑥𝑇𝐼(𝑅𝑥) ≥ 0
Representability – Convex QCQP 1
Nonlinear in xx
Linear in x, θx
Representability – Convex QCQP 2 (another way)
Nonlinear in xx
Linear in x, W and θx
Representability – SOCP
A second order conic constraint:
𝑄𝑥 + 𝑑 ≤ 𝑔𝑇𝑥 + ℎ ⟺ 𝑄𝑥 + 𝑑 2 ≤ 𝑔𝑇𝑥 + ℎ 2
⟺ 𝑔𝑇𝑥 + ℎ 2 − 𝑄𝑥 + 𝑑 𝑇 𝑄𝑥 + 𝑑 ≥ 0
⟺ 𝑔𝑇𝑥 + ℎ − 𝑄𝑥 + 𝑑 𝑇𝐼
𝑔𝑇𝑥 + ℎ𝑄𝑥 + 𝑑 ≥ 0
⟺
Representability – The Schur complement
Nonlinear matrix inequalities
Riccati inequality (P variable, A, B, Q fixed and R fixed and pos. def.):
Fractional quadratic inequality:
General matrix inequality:
𝐴𝑇𝑃 + 𝑃𝐴 + 𝑃𝐵𝑅−1𝐵𝑇𝑃 + 𝑄 ≼ 0
−𝐴𝑇 − 𝑃𝐴 − 𝑄 𝑃𝐵
𝐵𝑇𝑃 𝑅≽ 0
Application: SDP relaxation of MAX-CUT
MAX-CUT problem
Consider an undirected graph G=(V,E) with n vertices and edgeweights 𝑤𝑖𝑗 ≥ 0 (𝑤𝑖𝑗 = 𝑤𝑗𝑖) for all edges (𝑖, 𝑗) ∈ 𝐸. Find a subset S of V such that the sum of the weights of the edges that cross from S to V \ S (the complement of S) is maximized.
MAX-CUT (combinatorial) formulation
maximize1
4
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗(1 − 𝑥𝑖𝑥𝑗)
s. t. 𝑥𝑗 ∈ −1,1 𝑗 = 1,… , 𝑛
MAX-CUT is a non-convex, quadratic combinatorial problem. The problemis NP-hard (very difficult to solve to optimality for large instances).
Let 𝑋 = 𝑥𝑥𝑇 , that is 𝑋𝑖𝑗 = 𝑥𝑖𝑥𝑗 , and 𝑥𝑗 ∈ −1,1 ⇔ 𝑥𝑗2 = 1 ⇔ 𝑋𝑗𝑗 = 1.
This leads to an equivalent formulation of MAX-CUT:
maximize1
4
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗 −1
4
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑋𝑖𝑗
s. t. 𝑋𝑗𝑗= 1, 𝑗 = 1, . . , 𝑛
𝑿 = 𝒙𝒙𝑻
We relax the problematic rank-1 constraint 𝑋 = 𝑥𝑥𝑇 to 𝑋 ≽ 0 and denoteby W the matrix with elements 𝑤𝑖𝑗 .
SDP relaxation of MAX-CUT
maximize1
4
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗 −1
4trace(𝑊𝑋)
s. t. 𝑋𝑗𝑗= 1, 𝑗 = 1,… , 𝑛
𝑋 ≽ 0
Classification
Separation of two sets of points in 𝑹𝒏
Set 1: X = {𝑥1, 𝑥2, … , 𝑥𝑁} Set 2: Y = {𝑦1, 𝑦2, … , 𝑦𝑀}
Find a function f(x) that separates X and Y as good as possible. That is,
𝑓 𝑥𝑖 > 0 and 𝑓 𝑦𝑖 < 0 for as many points as possible.
Linear discrimination: 𝑓 𝑥 = 𝑎𝑇𝑥 + 𝑏 (LP)
Classification
Quadratic convex discrimination: 𝑓 𝑥 = 𝑥𝑇𝑃𝑥 + 𝑞𝑇𝑥 + 𝑟
Assumption: Separation surface is ellipoidal (𝑃 ≺ 0) and contains all points Xand none of points Y. This leads to a SDP feasibility problem:
minimize 1𝑇𝑢 + 1𝑇𝑣
s.t.
Classification
SDP representability
We have seen, for example, that:
convex quadratic contraints: 𝑥𝑇𝑃𝑥 + 𝑞𝑇𝑥 + 𝑟 ≤ 0 and
second order cone constraints: 𝐴𝑥 + 𝑏 ≤ 𝑐𝑇 + 𝑑
can be represented by linear semidefinite constraints, also called LMIs.
These constraints (sets) are semidefinite representable (SDr).
Definition: A convex function is called SDr if its epigraph is SDr.
We will see that a variety of convex functions admits an SDr. This means
that the modeling abilities of SDP are far greater than in LP, QP, QCQP and
SOCP programming.
Eigenvalue formulations using SDPSDP allows modeling of functions that include eigenvalues and singularvalues of matrices.
Largest eigenvalue of a symmetric matrix
Spectral norm of a symmetric matrix
Singular value formulations using SDP
The representation of singular values of a general rectangular matrixfollows from:
The largest singular value of a matrix (the operator norm)
The sum of p largest singular values is also SDr.
Combinatorial problem: 0-1 Quadratic Program (01 QP)
A standard 0-1 QP has the form:
Q, A, B are matrices and q, a, b are vectors of appropriate dimensions.
Some applications include: Max-Cut of a graph (unconstrained) Knapsack problems (inequality constrained) Graph bipartitioning Task allocation Quadratic assignment problems Coulomb glass Boolean least squares
SDP relaxation 0-1 QP
𝑋 = 𝑥𝑥𝑇 → 𝑋 − 𝑥𝑥𝑇 ≽ 0 ⟺ 1 𝑥𝑇
𝑥 𝑋≽ 0
Relaxation of binary x into a positive semidefinite matrix variable X.
min 𝑄 • 𝑋 + 𝑞𝑇𝑥𝑠. 𝑡. 𝐴𝑥 = 𝑎
𝐵𝑥 ≤ 𝑏diag 𝑋 = 𝑥
1 𝑥𝑇
𝑥 𝑋≽ 0
Semidefinite relaxation:
Gives tight lower boundon 0-1 QP
A quadratic expression in 𝑥 is linear in 𝑋: 𝑥𝑇𝑄𝑥 = 𝑄 • 𝑋 = σ𝑖σ𝑗𝑄𝑖𝑗𝑋𝑖𝑗
Binary condition: 𝑥𝑖 ∈ 0,1 ⇔ 𝑥𝑖2 − 𝑥𝑖 = 0 ⇔ 𝑋𝑖𝑖 = 𝑥𝑖 ⟺ diag 𝑋 = 𝑥
Convexification of 0-1 QPs
Basic approach: If Q is indefinite, add sufficient large quadratic terms to the diagonal and subtract the same amount from the linear terms.
Recall that: 𝑥𝑖 ∈ 0,1 ⇔ 𝑥𝑖2 = 𝑥𝑖
Example
𝑓 𝑥 = 𝑥𝑇1 33 2
𝑥 = 𝑥12 + 6𝑥1𝑥2 + 2𝑥2
2
𝑓 𝑥 = 𝑥𝑇1 33 2
𝑥 = 𝑥𝑇3 33 5
𝑥 −23
𝑇
𝑥 = 3𝑥12 + 6𝑥1𝑥2 + 5𝑥2
2 − 2𝑥1 − 3𝑥2
IndefinitePositive
semidefinite
Same functionon {0,1}x{0,1}
Convexification of 0-1 QPs
The following are equivalent (𝑄 = 𝑄𝑇):
The quadratic function 𝑓 𝑥 = 𝑥𝑇𝑄𝑥 is convex on 𝑅𝑛.
The matrix 𝑄 is positive semidefinite (𝑄 ≽ 0).
All eigenvalues of 𝑄 are non-negative (𝜆𝑖 ≥ 0).
A sufficient condition for convexity: A diagonally dominant matris is PSD.
Definition: A matrix 𝑄 is diagonally dominant if
𝑄𝑖𝑖 ≥
𝑖≠𝑗
𝑄𝑖𝑗 ∀𝑖
Convexification of 0-1 QPs
Example : a) Diagonal dominance b) Minimum eigenvalue
𝑄 =
1 2 −3 22 2 −3 4−3 −3 2 02 4 0 −2
min 𝑥𝑇𝑄𝑥
𝑠. 𝑡. 𝑥 ∈ 0,1 4
a) Diagonal dominance
𝑄 =
7 2 −3 22 9 −3 4−3 −3 6 02 4 0 6
ො𝑞 =
6748
eig( 𝑄) =
1.664.906.8814.56
eig(𝑄) =
−5.17−1.040.958.26
min 𝑥𝑇 𝑄𝑥 − ො𝑞𝑇𝑥
𝑠. 𝑡. 𝑥 ∈ [0,1]4
𝐨𝐩𝐭𝐢𝐦𝐚𝐥 𝐯𝐚𝐥𝐮𝐞 = −𝟓. 𝟗𝟑
b) Minimum eigenvalue
𝑄 =
6.17 2 −3 22 7.17 −3 4−3 −3 7.17 02 4 0 3.17
ො𝑞 =
5.175.175.175.17
eig( 𝑄) =
04.136.1213.43
min 𝑥𝑇 𝑄𝑥 − ො𝑞𝑇𝑥
𝑠. 𝑡. 𝑥 ∈ [0,1]4𝐨𝐩𝐭𝐢𝐦𝐚𝐥 𝐯𝐚𝐥𝐮𝐞 = −𝟓. 𝟑𝟒
Convexification of 0-1 QPs
c) The best diagonal. The QCR (SDP based) method allows computation of the diagonal that gives the largest value of the relaxation.
eig( 𝑄) =
01.316.7112.21
min 𝑥𝑇 𝑄𝑥 − ො𝑞𝑇𝑥
𝑠. 𝑡. 𝑥 ∈ [0,1]4𝐨𝐩𝐭𝐢𝐦𝐚𝐥 𝐯𝐚𝐥𝐮𝐞 = −𝟒. 𝟎𝟖
𝑄 =
2.93 2 −3 22 4.28 −3 4−3 −3 6.83 02 4 0 6.20
ො𝑞 =
1.932.284.838.20
min 𝑥𝑇𝑄𝑥
𝑠. 𝑡. 𝑥 ∈ 0,1 4 𝐨𝐩𝐭𝐢𝐦𝐚𝐥 𝐯𝐚𝐥𝐮𝐞 = −𝟑
Bounding: −𝟓. 𝟗𝟑 ≤ −𝟓. 𝟑𝟒 ≤ −𝟒. 𝟎𝟖 ≤ −𝟑
Convexification of 0-1 QPs
𝑋 = 𝑥𝑥𝑇 ↦ 𝑋 − 𝑥𝑥𝑇 ≽ 0 ⟺ 1 𝑥𝑇
𝑥 𝑋≽ 0
Relaxation into a positive semidefinite
matrix variable
Binary condition: 𝑥𝑖 ∈ 0,1 ⇔ 𝑥𝑖2 − 𝑥𝑖 = 0 ⇔ 𝑋𝑖𝑖 = 𝑥𝑖
min 𝑄 • 𝑋 + 𝑞𝑇𝑥𝑠. 𝑡. 𝐴𝑥 = 𝑎
𝐵𝑥 ≤ 𝑏diag 𝑋 = 𝑥
1 𝑥𝑇
𝑥 𝑋≽ 0
Semidefinite relaxation:
A quadratic expression in 𝑥 is linear in 𝑋: 𝑥𝑇𝑄𝑥 = 𝑄 • 𝑋 = σ𝑖σ𝑗𝑄𝑖𝑗𝑋𝑖𝑗
Deriving the optimal diagonal
Lagrangian relaxation of 0-1 QP:
𝑓 𝑥, 𝜆, 𝜇, 𝛿 = 𝑥𝑇𝑄𝑥 + 𝑞𝑇𝑥 + 𝜆𝑇 𝐴𝑥 − 𝑎 + 𝜇𝑇 𝐵𝑥 − 𝑏 +
𝑖=1
𝑛
𝛿𝑖(𝑥𝑖2 − 𝑥𝑖)
= 𝑥𝑇(𝑄 + Diag 𝛿ത𝑄
)𝑥 + (𝑞 + 𝐴𝑇𝜆 + 𝐵𝑇𝜇 − 𝛿)ത𝑞
𝑇𝑥 −𝜆𝑇𝑎 − 𝜇𝑇𝑏
ҧ𝑐
sup inf 𝑥𝑇 ത𝑄𝑥 + ത𝑞𝑇𝑥 + ҧ𝑐
𝛿, 𝜆, 𝜇 𝑥 ∈ 𝑅𝑛
Lagrangian dual problem:
which equals a semidefinite program
max 𝑡
𝑠. 𝑡.−𝑡 + ҧ𝑐
1
2ത𝑞𝑇
1
2ത𝑞 ത𝑄
≽ 0
𝛿 ∈ 𝑅𝑛, 𝜆 ∈ 𝑅𝑚, 𝜇 ∈ 𝑅+𝑘
Convexification of 0-1 QPs
max 𝑡
𝑠. 𝑡.−𝑡 + ҧ𝑐
1
2ത𝑞𝑇
1
2ത𝑞 ത𝑄
≽ 0
𝛿 ∈ 𝑅𝑛, 𝜆 ∈ 𝑅𝑚, 𝜇 ∈ 𝑅+𝑘
min 𝑄 • 𝑋 + 𝑞𝑇𝑥𝑠. 𝑡. 𝐴𝑥 = 𝑎
𝐵𝑥 ≤ 𝑏diag 𝑋 = 𝑥
1 𝑥𝑇
𝑥 𝑋≽ 0
Solution of dual gives optimal values: 𝜹∗, 𝝀∗, 𝝁∗.Solution of primal gives optimal values: x* and X*
The multipliers from the constraints 𝒙𝒊𝟐 = 𝒙𝒊 are used to construct the
”best” diagonal perturbation of matrix 𝑸 according to
𝑸∗ = 𝑸+ 𝐃𝐢𝐚𝐠 𝜹∗ .
Summary
A short introduction to semidefinite programming
SDP is a general form of a convex program.
It includes LP, QP and SOCP as special cases.
SDP can be used, for example, for relaxation and reformulation of hard combinatorial problems.
SDP has many applications in modern control theory, statistics, mechanics and various problems connected to eigenvalues and singular values.