36
Ray Pörn, principal lecturer Åbo Akademi University Novia University of Applied Sciences Semidefinite Programming – Basics and Applications

Seminars in applied mathematics (5 credits) Semidefinite …users.abo.fi/khaggblo/npcw21/tutorial/SDPSlides.pdf · 2018. 1. 17. · Equivalent form of SDP: This form is connected

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Ray Pörn, principal lecturerÅbo Akademi University

    Novia University of Applied Sciences

    Semidefinite Programming – Basics and Applications

  • Content

    What is semidefinite programming (SDP)?

    How to represent different constraints

    Representability

    Relaxation techniques

    Reformulation strategies

  • Convex optimization

    General form:

    where f is a convex function and X is a convex set.

    Why is convex optimization important?

    Many practical problems can be posed as convex programs

    Local optimum = global optimum

    Hard non-convex problems can be approximated with convex ones

    Efficient (polynomial time) algorithms exist

    minimize 𝑓 𝑥subject to 𝑥 ∈ 𝑋

  • Basic linear algebra and notation

    Definition: A symmetric matrix 𝐴 is called positive semidefinite if 𝑥𝑇𝐴𝑥 ≥ 0 for all vectors 𝑥 ∈ 𝑅𝑛.

  • The cone of positive semidefinite matrices

    3D

    The cone is a convex set.

    This is a convex constraint: 𝑿 ≽ 𝟎

  • Hierarchy of optimization problems

    Optimizationproblem

    Convex program”nice”

    Non-convex program”not so nice”

    Conic program

    LP

    SDP

    SOCP

    QPLP

    convexification

    approximation

  • SDP

    SOCP

    QCQP

    QP

    Intro to semidefinite programming

    SDP semidefinite program

    SOCP second order conic program

    QCQP convex quadratically constrained

    quadratic program

    QP convex quadratic program

    LP linear program

    (linear) SDP

    Minimize a linear function over the intersection of an

    affine set and the cone of positive semidefinite matrices

    minimize trace 𝐶𝑋subject to trace 𝐴1𝑋 = 𝑏1

    ⋮trace 𝐴𝑚𝑋 = 𝑏𝑚𝑋 ≽ 0

    LP

  • Intro to SDP

    The constant matrices 𝐶, 𝐴1, … , 𝐴𝑚 are assumed to be symmetric.

    Different notations:

    trace(𝐶𝑋) is the natural inner product 𝐶, 𝑋 in the space of symmetricmatrices. trace 𝐶𝑋 is a linear function of variables 𝑥𝑖𝑗.

    Example:

    trace 𝐶𝑋 = 𝐓𝐫 𝐶𝑋 = 𝐶, 𝑋 = 𝐶 • 𝑋 =

    𝑖=1

    𝑛

    𝑗=1

    𝑛

    𝑐𝑖𝑗𝑥𝑖𝑗

  • Semidefinite programming

    Standard form of SDP:

    This form is often called the primal problem. It has a matrix variable X, linear equality constraints and one conic constraint (X is psd).

    Equivalent form of SDP:

    This form is connected to the dual problem. It has a vector variable x, oneLinear Matrix Inequality (LMI) and a set of linear equalities.

    minimize trace 𝐶𝑋s. t. trace 𝐴𝑖𝑋 = 𝑎𝑖 𝑖 = 1,… ,𝑚

    𝑋 ≽ 0

    minimize 𝑐𝑇𝑥s. t. 𝐵𝑥 = 𝑏

    𝐵0 + 𝑥1𝐵1 + 𝑥2𝐵2 +⋯𝑥𝑛𝐵𝑛 ≽ 0

  • Example

    SDP with 𝐶 =1 22 3

    , 𝐴1 =1 −1−1 2

    , 𝐴2 =2 33 0

    , 𝑎 =86

    and symmetric matrix variable 𝑋 =𝑥1 𝑥2𝑥2 𝑥3

    .

    trace 𝐶𝑋 = 𝑥1 + 4𝑥2 + 3𝑥3trace 𝐴1𝑋 = 𝑥1 − 2𝑥2 + 2𝑥3trace 𝐴2𝑋 = 2𝑥1 + 6𝑥2

    Decompose matrix: 𝑋 =𝑥1 𝑥2𝑥2 𝑥3

    = 𝑥11 00 0

    +𝑥20 11 0

    +𝑥30 00 1

    Define:𝑐 = 1 4 3 𝑇

    𝑏 = 𝑎

    𝐵 =1 −2 22 6 0

    minimize 𝑥1 + 4𝑥2 + 3𝑥3s. t. 𝑥1 − 2𝑥2 + 2𝑥3 = 8

    2𝑥1 + 6𝑥2 = 6𝑥1 𝑥2𝑥2 𝑥3

    ≽ 0

    minimize 𝑥1 + 4𝑥2 + 3𝑥3s. t. 𝑥1 − 2𝑥2 + 2𝑥3 = 8

    2𝑥1 + 6𝑥2 = 6

    𝑥11 00 0

    +𝑥20 11 0

    +𝑥30 00 1

    ≽ 0

  • Representability - LP

    A set of linear inequality constraints:

    ቊ2𝑥 + 3𝑦 ≤ 10−𝑥 + 2𝑦 ≤ 5

    ⇔ ൜10 − 2𝑥 − 3𝑦 ≥ 05 + 𝑥 − 2𝑦 ≥ 0

    ⇔10 − 2𝑥 − 3𝑦 0

    0 5 + 𝑥 − 2𝑦≽ 0

    since a diagonal matrix is PSD iff all diagonal elements are non-negative.

    LP as an SDP

  • Representability – a convex quadratic constraint

    A convex quadratic constraint:

    Recall:

    What about a concave quadratic constraint?

    4𝑥2 − 10𝑥 + 2 ≤ 0

    𝑎 𝑏𝑏 𝑐

    ≽ 0 ⟺ 𝑎 ≥ 0 ⋀ 𝑎𝑐 − 𝑏2 ≥ 0

    4𝑥2 − 10𝑥 + 2 ≤ 0 ⟺ 1 10𝑥 − 2 − 2𝑥 2 ≥ 0

    ⟺1 2𝑥2𝑥 10𝑥 − 2

    ≽ 0

    −4𝑥2 − 10𝑥 + 2 ≤ 0

  • Representability - QP

    A general convex quadratic constraint: 𝑥𝑇𝑄𝑥 + 𝑞𝑇𝑥 + 𝑞0 ≤ 0

    𝑥𝑇𝑄𝑥 + 𝑞𝑇𝑥 + 𝑞0 ≤ 0

    ⟺𝐼 𝑅𝑥

    𝑅𝑥 𝑇 −𝑞𝑇𝑥 − 𝑞0≽ 0

    ⟺ 𝑥𝑇𝑅𝑇𝑅𝑥 + 𝑞𝑇𝑥 + 𝑞0 ≤ 0

    ⟺ 𝑅𝑥 𝑇𝑅𝑥 + 𝑞𝑇𝑥 + 𝑞0 ≤ 0

    ⟺ −𝑞𝑇𝑥 − 𝑞0 − 𝑅𝑥𝑇𝐼(𝑅𝑥) ≥ 0

  • Representability – Convex QCQP 1

    Nonlinear in xx

    Linear in x, θx

  • Representability – Convex QCQP 2 (another way)

    Nonlinear in xx

    Linear in x, W and θx

  • Representability – SOCP

    A second order conic constraint:

    𝑄𝑥 + 𝑑 ≤ 𝑔𝑇𝑥 + ℎ ⟺ 𝑄𝑥 + 𝑑 2 ≤ 𝑔𝑇𝑥 + ℎ 2

    ⟺ 𝑔𝑇𝑥 + ℎ 2 − 𝑄𝑥 + 𝑑 𝑇 𝑄𝑥 + 𝑑 ≥ 0

    ⟺ 𝑔𝑇𝑥 + ℎ − 𝑄𝑥 + 𝑑 𝑇𝐼

    𝑔𝑇𝑥 + ℎ𝑄𝑥 + 𝑑 ≥ 0

  • Representability – The Schur complement

  • Nonlinear matrix inequalities

    Riccati inequality (P variable, A, B, Q fixed and R fixed and pos. def.):

    Fractional quadratic inequality:

    General matrix inequality:

    𝐴𝑇𝑃 + 𝑃𝐴 + 𝑃𝐵𝑅−1𝐵𝑇𝑃 + 𝑄 ≼ 0

    −𝐴𝑇 − 𝑃𝐴 − 𝑄 𝑃𝐵

    𝐵𝑇𝑃 𝑅≽ 0

  • Application: SDP relaxation of MAX-CUT

    MAX-CUT problem

    Consider an undirected graph G=(V,E) with n vertices and edgeweights 𝑤𝑖𝑗 ≥ 0 (𝑤𝑖𝑗 = 𝑤𝑗𝑖) for all edges (𝑖, 𝑗) ∈ 𝐸. Find a subset S of V such that the sum of the weights of the edges that cross from S to V \ S (the complement of S) is maximized.

    MAX-CUT (combinatorial) formulation

    maximize1

    4

    𝑖=1

    𝑛

    𝑗=1

    𝑛

    𝑤𝑖𝑗(1 − 𝑥𝑖𝑥𝑗)

    s. t. 𝑥𝑗 ∈ −1,1 𝑗 = 1,… , 𝑛

    MAX-CUT is a non-convex, quadratic combinatorial problem. The problemis NP-hard (very difficult to solve to optimality for large instances).

  • Let 𝑋 = 𝑥𝑥𝑇 , that is 𝑋𝑖𝑗 = 𝑥𝑖𝑥𝑗 , and 𝑥𝑗 ∈ −1,1 ⇔ 𝑥𝑗2 = 1 ⇔ 𝑋𝑗𝑗 = 1.

    This leads to an equivalent formulation of MAX-CUT:

    maximize1

    4

    𝑖=1

    𝑛

    𝑗=1

    𝑛

    𝑤𝑖𝑗 −1

    4

    𝑖=1

    𝑛

    𝑗=1

    𝑛

    𝑤𝑖𝑗𝑋𝑖𝑗

    s. t. 𝑋𝑗𝑗= 1, 𝑗 = 1, . . , 𝑛

    𝑿 = 𝒙𝒙𝑻

    We relax the problematic rank-1 constraint 𝑋 = 𝑥𝑥𝑇 to 𝑋 ≽ 0 and denoteby W the matrix with elements 𝑤𝑖𝑗 .

    SDP relaxation of MAX-CUT

    maximize1

    4

    𝑖=1

    𝑛

    𝑗=1

    𝑛

    𝑤𝑖𝑗 −1

    4trace(𝑊𝑋)

    s. t. 𝑋𝑗𝑗= 1, 𝑗 = 1,… , 𝑛

    𝑋 ≽ 0

  • Classification

    Separation of two sets of points in 𝑹𝒏

    Set 1: X = {𝑥1, 𝑥2, … , 𝑥𝑁} Set 2: Y = {𝑦1, 𝑦2, … , 𝑦𝑀}

    Find a function f(x) that separates X and Y as good as possible. That is,

    𝑓 𝑥𝑖 > 0 and 𝑓 𝑦𝑖 < 0 for as many points as possible.

    Linear discrimination: 𝑓 𝑥 = 𝑎𝑇𝑥 + 𝑏 (LP)

  • Classification

    Quadratic convex discrimination: 𝑓 𝑥 = 𝑥𝑇𝑃𝑥 + 𝑞𝑇𝑥 + 𝑟

    Assumption: Separation surface is ellipoidal (𝑃 ≺ 0) and contains all points Xand none of points Y. This leads to a SDP feasibility problem:

    minimize 1𝑇𝑢 + 1𝑇𝑣

    s.t.

  • Classification

  • SDP representability

    We have seen, for example, that:

    convex quadratic contraints: 𝑥𝑇𝑃𝑥 + 𝑞𝑇𝑥 + 𝑟 ≤ 0 and

    second order cone constraints: 𝐴𝑥 + 𝑏 ≤ 𝑐𝑇 + 𝑑

    can be represented by linear semidefinite constraints, also called LMIs.

    These constraints (sets) are semidefinite representable (SDr).

    Definition: A convex function is called SDr if its epigraph is SDr.

    We will see that a variety of convex functions admits an SDr. This means

    that the modeling abilities of SDP are far greater than in LP, QP, QCQP and

    SOCP programming.

  • Eigenvalue formulations using SDPSDP allows modeling of functions that include eigenvalues and singularvalues of matrices.

    Largest eigenvalue of a symmetric matrix

    Spectral norm of a symmetric matrix

  • Singular value formulations using SDP

    The representation of singular values of a general rectangular matrixfollows from:

    The largest singular value of a matrix (the operator norm)

    The sum of p largest singular values is also SDr.

  • Combinatorial problem: 0-1 Quadratic Program (01 QP)

    A standard 0-1 QP has the form:

    Q, A, B are matrices and q, a, b are vectors of appropriate dimensions.

    Some applications include: Max-Cut of a graph (unconstrained) Knapsack problems (inequality constrained) Graph bipartitioning Task allocation Quadratic assignment problems Coulomb glass Boolean least squares

  • SDP relaxation 0-1 QP

    𝑋 = 𝑥𝑥𝑇 → 𝑋 − 𝑥𝑥𝑇 ≽ 0 ⟺ 1 𝑥𝑇

    𝑥 𝑋≽ 0

    Relaxation of binary x into a positive semidefinite matrix variable X.

    min 𝑄 • 𝑋 + 𝑞𝑇𝑥𝑠. 𝑡. 𝐴𝑥 = 𝑎

    𝐵𝑥 ≤ 𝑏diag 𝑋 = 𝑥

    1 𝑥𝑇

    𝑥 𝑋≽ 0

    Semidefinite relaxation:

    Gives tight lower boundon 0-1 QP

    A quadratic expression in 𝑥 is linear in 𝑋: 𝑥𝑇𝑄𝑥 = 𝑄 • 𝑋 = σ𝑖σ𝑗𝑄𝑖𝑗𝑋𝑖𝑗

    Binary condition: 𝑥𝑖 ∈ 0,1 ⇔ 𝑥𝑖2 − 𝑥𝑖 = 0 ⇔ 𝑋𝑖𝑖 = 𝑥𝑖 ⟺ diag 𝑋 = 𝑥

  • Convexification of 0-1 QPs

    Basic approach: If Q is indefinite, add sufficient large quadratic terms to the diagonal and subtract the same amount from the linear terms.

    Recall that: 𝑥𝑖 ∈ 0,1 ⇔ 𝑥𝑖2 = 𝑥𝑖

    Example

    𝑓 𝑥 = 𝑥𝑇1 33 2

    𝑥 = 𝑥12 + 6𝑥1𝑥2 + 2𝑥2

    2

    𝑓 𝑥 = 𝑥𝑇1 33 2

    𝑥 = 𝑥𝑇3 33 5

    𝑥 −23

    𝑇

    𝑥 = 3𝑥12 + 6𝑥1𝑥2 + 5𝑥2

    2 − 2𝑥1 − 3𝑥2

    IndefinitePositive

    semidefinite

    Same functionon {0,1}x{0,1}

  • Convexification of 0-1 QPs

    The following are equivalent (𝑄 = 𝑄𝑇):

    The quadratic function 𝑓 𝑥 = 𝑥𝑇𝑄𝑥 is convex on 𝑅𝑛.

    The matrix 𝑄 is positive semidefinite (𝑄 ≽ 0).

    All eigenvalues of 𝑄 are non-negative (𝜆𝑖 ≥ 0).

    A sufficient condition for convexity: A diagonally dominant matris is PSD.

    Definition: A matrix 𝑄 is diagonally dominant if

    𝑄𝑖𝑖 ≥

    𝑖≠𝑗

    𝑄𝑖𝑗 ∀𝑖

  • Convexification of 0-1 QPs

    Example : a) Diagonal dominance b) Minimum eigenvalue

    𝑄 =

    1 2 −3 22 2 −3 4−3 −3 2 02 4 0 −2

    min 𝑥𝑇𝑄𝑥

    𝑠. 𝑡. 𝑥 ∈ 0,1 4

    a) Diagonal dominance

    𝑄 =

    7 2 −3 22 9 −3 4−3 −3 6 02 4 0 6

    ො𝑞 =

    6748

    eig( 𝑄) =

    1.664.906.8814.56

    eig(𝑄) =

    −5.17−1.040.958.26

    min 𝑥𝑇 𝑄𝑥 − ො𝑞𝑇𝑥

    𝑠. 𝑡. 𝑥 ∈ [0,1]4

    𝐨𝐩𝐭𝐢𝐦𝐚𝐥 𝐯𝐚𝐥𝐮𝐞 = −𝟓. 𝟗𝟑

    b) Minimum eigenvalue

    𝑄 =

    6.17 2 −3 22 7.17 −3 4−3 −3 7.17 02 4 0 3.17

    ො𝑞 =

    5.175.175.175.17

    eig( 𝑄) =

    04.136.1213.43

    min 𝑥𝑇 𝑄𝑥 − ො𝑞𝑇𝑥

    𝑠. 𝑡. 𝑥 ∈ [0,1]4𝐨𝐩𝐭𝐢𝐦𝐚𝐥 𝐯𝐚𝐥𝐮𝐞 = −𝟓. 𝟑𝟒

  • Convexification of 0-1 QPs

    c) The best diagonal. The QCR (SDP based) method allows computation of the diagonal that gives the largest value of the relaxation.

    eig( 𝑄) =

    01.316.7112.21

    min 𝑥𝑇 𝑄𝑥 − ො𝑞𝑇𝑥

    𝑠. 𝑡. 𝑥 ∈ [0,1]4𝐨𝐩𝐭𝐢𝐦𝐚𝐥 𝐯𝐚𝐥𝐮𝐞 = −𝟒. 𝟎𝟖

    𝑄 =

    2.93 2 −3 22 4.28 −3 4−3 −3 6.83 02 4 0 6.20

    ො𝑞 =

    1.932.284.838.20

    min 𝑥𝑇𝑄𝑥

    𝑠. 𝑡. 𝑥 ∈ 0,1 4 𝐨𝐩𝐭𝐢𝐦𝐚𝐥 𝐯𝐚𝐥𝐮𝐞 = −𝟑

    Bounding: −𝟓. 𝟗𝟑 ≤ −𝟓. 𝟑𝟒 ≤ −𝟒. 𝟎𝟖 ≤ −𝟑

  • Convexification of 0-1 QPs

    𝑋 = 𝑥𝑥𝑇 ↦ 𝑋 − 𝑥𝑥𝑇 ≽ 0 ⟺ 1 𝑥𝑇

    𝑥 𝑋≽ 0

    Relaxation into a positive semidefinite

    matrix variable

    Binary condition: 𝑥𝑖 ∈ 0,1 ⇔ 𝑥𝑖2 − 𝑥𝑖 = 0 ⇔ 𝑋𝑖𝑖 = 𝑥𝑖

    min 𝑄 • 𝑋 + 𝑞𝑇𝑥𝑠. 𝑡. 𝐴𝑥 = 𝑎

    𝐵𝑥 ≤ 𝑏diag 𝑋 = 𝑥

    1 𝑥𝑇

    𝑥 𝑋≽ 0

    Semidefinite relaxation:

    A quadratic expression in 𝑥 is linear in 𝑋: 𝑥𝑇𝑄𝑥 = 𝑄 • 𝑋 = σ𝑖σ𝑗𝑄𝑖𝑗𝑋𝑖𝑗

  • Deriving the optimal diagonal

    Lagrangian relaxation of 0-1 QP:

    𝑓 𝑥, 𝜆, 𝜇, 𝛿 = 𝑥𝑇𝑄𝑥 + 𝑞𝑇𝑥 + 𝜆𝑇 𝐴𝑥 − 𝑎 + 𝜇𝑇 𝐵𝑥 − 𝑏 +

    𝑖=1

    𝑛

    𝛿𝑖(𝑥𝑖2 − 𝑥𝑖)

    = 𝑥𝑇(𝑄 + Diag 𝛿ത𝑄

    )𝑥 + (𝑞 + 𝐴𝑇𝜆 + 𝐵𝑇𝜇 − 𝛿)ത𝑞

    𝑇𝑥 −𝜆𝑇𝑎 − 𝜇𝑇𝑏

    ҧ𝑐

    sup inf 𝑥𝑇 ത𝑄𝑥 + ത𝑞𝑇𝑥 + ҧ𝑐

    𝛿, 𝜆, 𝜇 𝑥 ∈ 𝑅𝑛

    Lagrangian dual problem:

    which equals a semidefinite program

    max 𝑡

    𝑠. 𝑡.−𝑡 + ҧ𝑐

    1

    2ത𝑞𝑇

    1

    2ത𝑞 ത𝑄

    ≽ 0

    𝛿 ∈ 𝑅𝑛, 𝜆 ∈ 𝑅𝑚, 𝜇 ∈ 𝑅+𝑘

  • Convexification of 0-1 QPs

    max 𝑡

    𝑠. 𝑡.−𝑡 + ҧ𝑐

    1

    2ത𝑞𝑇

    1

    2ത𝑞 ത𝑄

    ≽ 0

    𝛿 ∈ 𝑅𝑛, 𝜆 ∈ 𝑅𝑚, 𝜇 ∈ 𝑅+𝑘

    min 𝑄 • 𝑋 + 𝑞𝑇𝑥𝑠. 𝑡. 𝐴𝑥 = 𝑎

    𝐵𝑥 ≤ 𝑏diag 𝑋 = 𝑥

    1 𝑥𝑇

    𝑥 𝑋≽ 0

    Solution of dual gives optimal values: 𝜹∗, 𝝀∗, 𝝁∗.Solution of primal gives optimal values: x* and X*

    The multipliers from the constraints 𝒙𝒊𝟐 = 𝒙𝒊 are used to construct the

    ”best” diagonal perturbation of matrix 𝑸 according to

    𝑸∗ = 𝑸+ 𝐃𝐢𝐚𝐠 𝜹∗ .

  • Summary

    A short introduction to semidefinite programming

    SDP is a general form of a convex program.

    It includes LP, QP and SOCP as special cases.

    SDP can be used, for example, for relaxation and reformulation of hard combinatorial problems.

    SDP has many applications in modern control theory, statistics, mechanics and various problems connected to eigenvalues and singular values.