Seminars in applied mathematics (5 credits) Semidefinite …users.abo.fi/khaggblo/npcw21/tutorial/SDPSlides.pdf · 2018. 1. 17. · Equivalent form of SDP: This form is connected

Ray Pörn, principal lecturerÅbo Akademi University

Novia University of Applied Sciences

Semidefinite Programming – Basics and Applications

Content

What is semidefinite programming (SDP)?

How to represent different constraints

Representability

Relaxation techniques

Reformulation strategies

Convex optimization

General form:

where f is a convex function and X is a convex set.

Why is convex optimization important?

Many practical problems can be posed as convex programs

Local optimum = global optimum

Hard non-convex problems can be approximated with convex ones

Efficient (polynomial time) algorithms exist

minimize 𝑓 𝑥subject to 𝑥 ∈ 𝑋

Basic linear algebra and notation

Definition: A symmetric matrix 𝐴 is called positive semidefinite if 𝑥𝑇𝐴𝑥 ≥ 0 for all vectors 𝑥 ∈ 𝑅𝑛.

The cone of positive semidefinite matrices

3D

The cone is a convex set.

This is a convex constraint: 𝑿 ≽ 𝟎

Hierarchy of optimization problems

Optimizationproblem

Convex program”nice”

Non-convex program”not so nice”

Conic program

LP

SDP

SOCP

QPLP

convexification

approximation

SDP

SOCP

QCQP

QP

Intro to semidefinite programming

SDP semidefinite program

SOCP second order conic program

QCQP convex quadratically constrained

quadratic program

QP convex quadratic program

LP linear program

(linear) SDP

Minimize a linear function over the intersection of an

affine set and the cone of positive semidefinite matrices

minimize trace 𝐶𝑋subject to trace 𝐴1𝑋 = 𝑏1

⋮trace 𝐴𝑚𝑋 = 𝑏𝑚𝑋 ≽ 0

LP

Intro to SDP

The constant matrices 𝐶, 𝐴1, … , 𝐴𝑚 are assumed to be symmetric.

Different notations:

trace(𝐶𝑋) is the natural inner product 𝐶, 𝑋 in the space of symmetricmatrices. trace 𝐶𝑋 is a linear function of variables 𝑥𝑖𝑗.

Example:

trace 𝐶𝑋 = 𝐓𝐫 𝐶𝑋 = 𝐶, 𝑋 = 𝐶 • 𝑋 =

𝑖=1

𝑛

𝑗=1

𝑛

𝑐𝑖𝑗𝑥𝑖𝑗

Semidefinite programming

Standard form of SDP:

This form is often called the primal problem. It has a matrix variable X, linear equality constraints and one conic constraint (X is psd).

Equivalent form of SDP:

This form is connected to the dual problem. It has a vector variable x, oneLinear Matrix Inequality (LMI) and a set of linear equalities.

minimize trace 𝐶𝑋s. t. trace 𝐴𝑖𝑋 = 𝑎𝑖 𝑖 = 1,… ,𝑚

𝑋 ≽ 0

minimize 𝑐𝑇𝑥s. t. 𝐵𝑥 = 𝑏

𝐵0 + 𝑥1𝐵1 + 𝑥2𝐵2 +⋯𝑥𝑛𝐵𝑛 ≽ 0

Example

SDP with 𝐶 =1 22 3

, 𝐴1 =1 −1−1 2

, 𝐴2 =2 33 0

, 𝑎 =86

and symmetric matrix variable 𝑋 =𝑥1 𝑥2𝑥2 𝑥3

.

trace 𝐶𝑋 = 𝑥1 + 4𝑥2 + 3𝑥3trace 𝐴1𝑋 = 𝑥1 − 2𝑥2 + 2𝑥3trace 𝐴2𝑋 = 2𝑥1 + 6𝑥2

Decompose matrix: 𝑋 =𝑥1 𝑥2𝑥2 𝑥3

= 𝑥11 00 0

+𝑥20 11 0

+𝑥30 00 1

Define:𝑐 = 1 4 3 𝑇

𝑏 = 𝑎

𝐵 =1 −2 22 6 0

minimize 𝑥1 + 4𝑥2 + 3𝑥3s. t. 𝑥1 − 2𝑥2 + 2𝑥3 = 8

2𝑥1 + 6𝑥2 = 6𝑥1 𝑥2𝑥2 𝑥3

≽ 0

minimize 𝑥1 + 4𝑥2 + 3𝑥3s. t. 𝑥1 − 2𝑥2 + 2𝑥3 = 8

2𝑥1 + 6𝑥2 = 6

𝑥11 00 0

+𝑥20 11 0

+𝑥30 00 1

≽ 0

Representability - LP

A set of linear inequality constraints:

ቊ2𝑥 + 3𝑦 ≤ 10−𝑥 + 2𝑦 ≤ 5

⇔ ൜10 − 2𝑥 − 3𝑦 ≥ 05 + 𝑥 − 2𝑦 ≥ 0

⇔10 − 2𝑥 − 3𝑦 0

0 5 + 𝑥 − 2𝑦≽ 0

since a diagonal matrix is PSD iff all diagonal elements are non-negative.

LP as an SDP

Representability – a convex quadratic constraint

A convex quadratic constraint:

Recall:

What about a concave quadratic constraint?

4𝑥2 − 10𝑥 + 2 ≤ 0

𝑎 𝑏𝑏 𝑐

≽ 0 ⟺ 𝑎 ≥ 0 ⋀ 𝑎𝑐 − 𝑏2 ≥ 0

4𝑥2 − 10𝑥 + 2 ≤ 0 ⟺ 1 10𝑥 − 2 − 2𝑥 2 ≥ 0

⟺1 2𝑥2𝑥 10𝑥 − 2

≽ 0

−4𝑥2 − 10𝑥 + 2 ≤ 0

Representability - QP

A general convex quadratic constraint: 𝑥𝑇𝑄𝑥 + 𝑞𝑇𝑥 + 𝑞0 ≤ 0

𝑥𝑇𝑄𝑥 + 𝑞𝑇𝑥 + 𝑞0 ≤ 0

⟺𝐼 𝑅𝑥

𝑅𝑥 𝑇 −𝑞𝑇𝑥 − 𝑞0≽ 0

⟺ 𝑥𝑇𝑅𝑇𝑅𝑥 + 𝑞𝑇𝑥 + 𝑞0 ≤ 0

⟺ 𝑅𝑥 𝑇𝑅𝑥 + 𝑞𝑇𝑥 + 𝑞0 ≤ 0

⟺ −𝑞𝑇𝑥 − 𝑞0 − 𝑅𝑥𝑇𝐼(𝑅𝑥) ≥ 0

Representability – Convex QCQP 1

Nonlinear in xx

Linear in x, θx

Representability – Convex QCQP 2 (another way)

Nonlinear in xx

Linear in x, W and θx

Representability – SOCP

A second order conic constraint:

𝑄𝑥 + 𝑑 ≤ 𝑔𝑇𝑥 + ℎ ⟺ 𝑄𝑥 + 𝑑 2 ≤ 𝑔𝑇𝑥 + ℎ 2

⟺ 𝑔𝑇𝑥 + ℎ 2 − 𝑄𝑥 + 𝑑 𝑇 𝑄𝑥 + 𝑑 ≥ 0

⟺ 𝑔𝑇𝑥 + ℎ − 𝑄𝑥 + 𝑑 𝑇𝐼

𝑔𝑇𝑥 + ℎ𝑄𝑥 + 𝑑 ≥ 0

⟺

Representability – The Schur complement

Nonlinear matrix inequalities

Riccati inequality (P variable, A, B, Q fixed and R fixed and pos. def.):

Fractional quadratic inequality:

General matrix inequality:

𝐴𝑇𝑃 + 𝑃𝐴 + 𝑃𝐵𝑅−1𝐵𝑇𝑃 + 𝑄 ≼ 0

−𝐴𝑇 − 𝑃𝐴 − 𝑄 𝑃𝐵

𝐵𝑇𝑃 𝑅≽ 0

Application: SDP relaxation of MAX-CUT

MAX-CUT problem

Consider an undirected graph G=(V,E) with n vertices and edgeweights 𝑤𝑖𝑗 ≥ 0 (𝑤𝑖𝑗 = 𝑤𝑗𝑖) for all edges (𝑖, 𝑗) ∈ 𝐸. Find a subset S of V such that the sum of the weights of the edges that cross from S to V \ S (the complement of S) is maximized.

MAX-CUT (combinatorial) formulation

maximize1

4

𝑖=1

𝑛

𝑗=1

𝑛

𝑤𝑖𝑗(1 − 𝑥𝑖𝑥𝑗)

s. t. 𝑥𝑗 ∈ −1,1 𝑗 = 1,… , 𝑛

MAX-CUT is a non-convex, quadratic combinatorial problem. The problemis NP-hard (very difficult to solve to optimality for large instances).

Let 𝑋 = 𝑥𝑥𝑇 , that is 𝑋𝑖𝑗 = 𝑥𝑖𝑥𝑗 , and 𝑥𝑗 ∈ −1,1 ⇔ 𝑥𝑗2 = 1 ⇔ 𝑋𝑗𝑗 = 1.

This leads to an equivalent formulation of MAX-CUT:

maximize1

4

𝑖=1

𝑛

𝑗=1

𝑛

𝑤𝑖𝑗 −1

4

𝑖=1

𝑛

𝑗=1

𝑛

𝑤𝑖𝑗𝑋𝑖𝑗

s. t. 𝑋𝑗𝑗= 1, 𝑗 = 1, . . , 𝑛

𝑿 = 𝒙𝒙𝑻

We relax the problematic rank-1 constraint 𝑋 = 𝑥𝑥𝑇 to 𝑋 ≽ 0 and denoteby W the matrix with elements 𝑤𝑖𝑗 .

SDP relaxation of MAX-CUT

maximize1

4

𝑖=1

𝑛

𝑗=1

𝑛

𝑤𝑖𝑗 −1

4trace(𝑊𝑋)

s. t. 𝑋𝑗𝑗= 1, 𝑗 = 1,… , 𝑛

𝑋 ≽ 0

Classification

Separation of two sets of points in 𝑹𝒏

Set 1: X = {𝑥1, 𝑥2, … , 𝑥𝑁} Set 2: Y = {𝑦1, 𝑦2, … , 𝑦𝑀}

Find a function f(x) that separates X and Y as good as possible. That is,

𝑓 𝑥𝑖 > 0 and 𝑓 𝑦𝑖 < 0 for as many points as possible.

Linear discrimination: 𝑓 𝑥 = 𝑎𝑇𝑥 + 𝑏 (LP)

Classification

Quadratic convex discrimination: 𝑓 𝑥 = 𝑥𝑇𝑃𝑥 + 𝑞𝑇𝑥 + 𝑟

Assumption: Separation surface is ellipoidal (𝑃 ≺ 0) and contains all points Xand none of points Y. This leads to a SDP feasibility problem:

minimize 1𝑇𝑢 + 1𝑇𝑣

s.t.

Classification

SDP representability

We have seen, for example, that:

convex quadratic contraints: 𝑥𝑇𝑃𝑥 + 𝑞𝑇𝑥 + 𝑟 ≤ 0 and

second order cone constraints: 𝐴𝑥 + 𝑏 ≤ 𝑐𝑇 + 𝑑

can be represented by linear semidefinite constraints, also called LMIs.

These constraints (sets) are semidefinite representable (SDr).

Definition: A convex function is called SDr if its epigraph is SDr.

We will see that a variety of convex functions admits an SDr. This means

that the modeling abilities of SDP are far greater than in LP, QP, QCQP and

SOCP programming.

Eigenvalue formulations using SDPSDP allows modeling of functions that include eigenvalues and singularvalues of matrices.

Largest eigenvalue of a symmetric matrix

Spectral norm of a symmetric matrix

Singular value formulations using SDP

The representation of singular values of a general rectangular matrixfollows from:

The largest singular value of a matrix (the operator norm)

The sum of p largest singular values is also SDr.

Combinatorial problem: 0-1 Quadratic Program (01 QP)

A standard 0-1 QP has the form:

Q, A, B are matrices and q, a, b are vectors of appropriate dimensions.

Some applications include: Max-Cut of a graph (unconstrained) Knapsack problems (inequality constrained) Graph bipartitioning Task allocation Quadratic assignment problems Coulomb glass Boolean least squares

SDP relaxation 0-1 QP

𝑋 = 𝑥𝑥𝑇 → 𝑋 − 𝑥𝑥𝑇 ≽ 0 ⟺ 1 𝑥𝑇

𝑥 𝑋≽ 0

Relaxation of binary x into a positive semidefinite matrix variable X.

min 𝑄 • 𝑋 + 𝑞𝑇𝑥𝑠. 𝑡. 𝐴𝑥 = 𝑎

𝐵𝑥 ≤ 𝑏diag 𝑋 = 𝑥

1 𝑥𝑇

𝑥 𝑋≽ 0

Semidefinite relaxation:

Gives tight lower boundon 0-1 QP

A quadratic expression in 𝑥 is linear in 𝑋: 𝑥𝑇𝑄𝑥 = 𝑄 • 𝑋 = σ𝑖σ𝑗𝑄𝑖𝑗𝑋𝑖𝑗

Binary condition: 𝑥𝑖 ∈ 0,1 ⇔ 𝑥𝑖2 − 𝑥𝑖 = 0 ⇔ 𝑋𝑖𝑖 = 𝑥𝑖 ⟺ diag 𝑋 = 𝑥

Convexification of 0-1 QPs

Basic approach: If Q is indefinite, add sufficient large quadratic terms to the diagonal and subtract the same amount from the linear terms.

Recall that: 𝑥𝑖 ∈ 0,1 ⇔ 𝑥𝑖2 = 𝑥𝑖

Example

𝑓 𝑥 = 𝑥𝑇1 33 2

𝑥 = 𝑥12 + 6𝑥1𝑥2 + 2𝑥2

2

𝑓 𝑥 = 𝑥𝑇1 33 2

𝑥 = 𝑥𝑇3 33 5

𝑥 −23

𝑇

𝑥 = 3𝑥12 + 6𝑥1𝑥2 + 5𝑥2

2 − 2𝑥1 − 3𝑥2

IndefinitePositive

semidefinite

Same functionon {0,1}x{0,1}


The following are equivalent (𝑄 = 𝑄𝑇):

The quadratic function 𝑓 𝑥 = 𝑥𝑇𝑄𝑥 is convex on 𝑅𝑛.

The matrix 𝑄 is positive semidefinite (𝑄 ≽ 0).

All eigenvalues of 𝑄 are non-negative (𝜆𝑖 ≥ 0).

A sufficient condition for convexity: A diagonally dominant matris is PSD.

Definition: A matrix 𝑄 is diagonally dominant if

𝑄𝑖𝑖 ≥

𝑖≠𝑗

𝑄𝑖𝑗 ∀𝑖


Example : a) Diagonal dominance b) Minimum eigenvalue

𝑄 =

1 2 −3 22 2 −3 4−3 −3 2 02 4 0 −2

min 𝑥𝑇𝑄𝑥

𝑠. 𝑡. 𝑥 ∈ 0,1 4

a) Diagonal dominance

𝑄 =

7 2 −3 22 9 −3 4−3 −3 6 02 4 0 6

ො𝑞 =

6748

eig( 𝑄) =

1.664.906.8814.56

eig(𝑄) =

−5.17−1.040.958.26

min 𝑥𝑇 𝑄𝑥 − ො𝑞𝑇𝑥

𝑠. 𝑡. 𝑥 ∈ [0,1]4

𝐨𝐩𝐭𝐢𝐦𝐚𝐥 𝐯𝐚𝐥𝐮𝐞 = −𝟓. 𝟗𝟑

b) Minimum eigenvalue

𝑄 =

6.17 2 −3 22 7.17 −3 4−3 −3 7.17 02 4 0 3.17

ො𝑞 =

5.175.175.175.17

eig( 𝑄) =

04.136.1213.43


𝑠. 𝑡. 𝑥 ∈ [0,1]4𝐨𝐩𝐭𝐢𝐦𝐚𝐥 𝐯𝐚𝐥𝐮𝐞 = −𝟓. 𝟑𝟒


c) The best diagonal. The QCR (SDP based) method allows computation of the diagonal that gives the largest value of the relaxation.

eig( 𝑄) =

01.316.7112.21


𝑠. 𝑡. 𝑥 ∈ [0,1]4𝐨𝐩𝐭𝐢𝐦𝐚𝐥 𝐯𝐚𝐥𝐮𝐞 = −𝟒. 𝟎𝟖

𝑄 =

2.93 2 −3 22 4.28 −3 4−3 −3 6.83 02 4 0 6.20

ො𝑞 =

1.932.284.838.20

min 𝑥𝑇𝑄𝑥

𝑠. 𝑡. 𝑥 ∈ 0,1 4 𝐨𝐩𝐭𝐢𝐦𝐚𝐥 𝐯𝐚𝐥𝐮𝐞 = −𝟑

Bounding: −𝟓. 𝟗𝟑 ≤ −𝟓. 𝟑𝟒 ≤ −𝟒. 𝟎𝟖 ≤ −𝟑


𝑋 = 𝑥𝑥𝑇 ↦ 𝑋 − 𝑥𝑥𝑇 ≽ 0 ⟺ 1 𝑥𝑇

𝑥 𝑋≽ 0

Relaxation into a positive semidefinite

matrix variable

Binary condition: 𝑥𝑖 ∈ 0,1 ⇔ 𝑥𝑖2 − 𝑥𝑖 = 0 ⇔ 𝑋𝑖𝑖 = 𝑥𝑖



1 𝑥𝑇

𝑥 𝑋≽ 0

Semidefinite relaxation:

A quadratic expression in 𝑥 is linear in 𝑋: 𝑥𝑇𝑄𝑥 = 𝑄 • 𝑋 = σ𝑖σ𝑗𝑄𝑖𝑗𝑋𝑖𝑗

Deriving the optimal diagonal

Lagrangian relaxation of 0-1 QP:

𝑓 𝑥, 𝜆, 𝜇, 𝛿 = 𝑥𝑇𝑄𝑥 + 𝑞𝑇𝑥 + 𝜆𝑇 𝐴𝑥 − 𝑎 + 𝜇𝑇 𝐵𝑥 − 𝑏 +

𝑖=1

𝑛

𝛿𝑖(𝑥𝑖2 − 𝑥𝑖)

= 𝑥𝑇(𝑄 + Diag 𝛿ത𝑄

)𝑥 + (𝑞 + 𝐴𝑇𝜆 + 𝐵𝑇𝜇 − 𝛿)ത𝑞

𝑇𝑥 −𝜆𝑇𝑎 − 𝜇𝑇𝑏

ҧ𝑐

sup inf 𝑥𝑇 ത𝑄𝑥 + ത𝑞𝑇𝑥 + ҧ𝑐

𝛿, 𝜆, 𝜇 𝑥 ∈ 𝑅𝑛

Lagrangian dual problem:

which equals a semidefinite program

max 𝑡

𝑠. 𝑡.−𝑡 + ҧ𝑐

1

2ത𝑞𝑇

1

2ത𝑞 ത𝑄

≽ 0

𝛿 ∈ 𝑅𝑛, 𝜆 ∈ 𝑅𝑚, 𝜇 ∈ 𝑅+𝑘


max 𝑡

𝑠. 𝑡.−𝑡 + ҧ𝑐

1

2ത𝑞𝑇

1

2ത𝑞 ത𝑄

≽ 0

𝛿 ∈ 𝑅𝑛, 𝜆 ∈ 𝑅𝑚, 𝜇 ∈ 𝑅+𝑘



1 𝑥𝑇

𝑥 𝑋≽ 0

Solution of dual gives optimal values: 𝜹∗, 𝝀∗, 𝝁∗.Solution of primal gives optimal values: x* and X*

The multipliers from the constraints 𝒙𝒊𝟐 = 𝒙𝒊 are used to construct the

”best” diagonal perturbation of matrix 𝑸 according to

𝑸∗ = 𝑸+ 𝐃𝐢𝐚𝐠 𝜹∗ .

Summary

A short introduction to semidefinite programming

SDP is a general form of a convex program.

It includes LP, QP and SOCP as special cases.

SDP can be used, for example, for relaxation and reformulation of hard combinatorial problems.

SDP has many applications in modern control theory, statistics, mechanics and various problems connected to eigenvalues and singular values.

Documents

Seminars in applied mathematics (5 credits) Semidefinite …users.abo.fi/khaggblo/npcw21/tutorial/SDPSlides.pdf · 2018. 1. 17. · Equivalent form of SDP: This form is connected