70
Alternating Direction Method of Multipliers Prof S. Boyd EE364b, Stanford University source: Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers (Boyd, Parikh, Chu, Peleato, Eckstein) 1

Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Alternating Direction Method of Multipliers

Prof S. Boyd

EE364b, Stanford University

source:

Distributed Optimization and Statistical Learning via the AlternatingDirection Method of Multipliers (Boyd, Parikh, Chu, Peleato, Eckstein)

1

Page 2: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Goals

robust methods for

◮ arbitrary-scale optimization

– machine learning/statistics with huge data-sets– dynamic optimization on large-scale network

◮ decentralized optimization

– devices/processors/agents coordinate to solve large problem, by passingrelatively small messages

2

Page 3: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Outline

Dual decomposition

Method of multipliers

Alternating direction method of multipliers

Common patterns

Examples

Consensus and exchange

Conclusions

Dual decomposition 3

Page 4: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Dual problem

◮ convex equality constrained optimization problem

minimize f(x)subject to Ax = b

◮ Lagrangian: L(x, y) = f(x) + yT (Ax− b)

◮ dual function: g(y) = infx L(x, y)

◮ dual problem: maximize g(y)

◮ recover x⋆ = argminx L(x, y⋆)

Dual decomposition 4

Page 5: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Dual ascent

◮ gradient method for dual problem: yk+1 = yk + αk∇g(yk)

◮ ∇g(yk) = Ax− b, where x = argminx L(x, yk)

◮ dual ascent method is

xk+1 := argminx L(x, yk) // x-minimization

yk+1 := yk + αk(Axk+1 − b) // dual update

◮ works, with lots of strong assumptions

Dual decomposition 5

Page 6: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Dual decomposition

◮ suppose f is separable:

f(x) = f1(x1) + · · ·+ fN (xN ), x = (x1, . . . , xN )

◮ then L is separable in x: L(x, y) = L1(x1, y) + · · ·+ LN (xN , y)− yT b,

Li(xi, y) = fi(xi) + yTAixi

◮ x-minimization in dual ascent splits into N separate minimizations

xk+1

i := argminxi

Li(xi, yk)

which can be carried out in parallel

Dual decomposition 6

Page 7: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Dual decomposition

◮ dual decomposition (Everett, Dantzig, Wolfe, Benders 1960–65)

xk+1

i := argminxiLi(xi, y

k), i = 1, . . . , N

yk+1 := yk + αk(∑N

i=1Aix

k+1

i − b)

◮ scatter yk; update xi in parallel; gather Aixk+1

i

◮ solve a large problem

– by iteratively solving subproblems (in parallel)– dual variable update provides coordination

◮ works, with lots of assumptions; often slow

Dual decomposition 7

Page 8: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Outline

Dual decomposition

Method of multipliers

Alternating direction method of multipliers

Common patterns

Examples

Consensus and exchange

Conclusions

Method of multipliers 8

Page 9: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Method of multipliers

◮ a method to robustify dual ascent

◮ use augmented Lagrangian (Hestenes, Powell 1969), ρ > 0

Lρ(x, y) = f(x) + yT (Ax− b) + (ρ/2)‖Ax− b‖22

◮ method of multipliers (Hestenes, Powell; analysis in Bertsekas 1982)

xk+1 := argminx

Lρ(x, yk)

yk+1 := yk + ρ(Axk+1 − b)

(note specific dual update step length ρ)

Method of multipliers 9

Page 10: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Method of multipliers dual update step

◮ optimality conditions (for differentiable f):

Ax⋆ − b = 0, ∇f(x⋆) +AT y⋆ = 0

(primal and dual feasibility)

◮ since xk+1 minimizes Lρ(x, yk)

0 = ∇xLρ(xk+1, yk)

= ∇xf(xk+1) +AT

(

yk + ρ(Axk+1 − b))

= ∇xf(xk+1) +AT yk+1

◮ dual update yk+1 = yk + ρ(xk+1 − b) makes (xk+1, yk+1) dual feasible

◮ primal feasibility achieved in limit: Axk+1 − b → 0

Method of multipliers 10

Page 11: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Method of multipliers

(compared to dual decomposition)

◮ good news: converges under much more relaxed conditions(f can be nondifferentiable, take on value +∞, . . . )

◮ bad news: quadratic penalty destroys splitting of the x-update, so can’tdo decomposition

Method of multipliers 11

Page 12: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Outline

Dual decomposition

Method of multipliers

Alternating direction method of multipliers

Common patterns

Examples

Consensus and exchange

Conclusions

Alternating direction method of multipliers 12

Page 13: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Alternating direction method of multipliers

◮ a method

– with good robustness of method of multipliers– which can support decomposition

◮ “robust dual decomposition” or “decomposable method of multipliers”

◮ proposed by Gabay, Mercier, Glowinski, Marrocco in 1976

Alternating direction method of multipliers 13

Page 14: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Alternating direction method of multipliers

◮ ADMM problem form (with f , g convex)

minimize f(x) + g(z)subject to Ax+Bz = c

– two sets of variables, with separable objective

◮ Lρ(x, z, y) = f(x) + g(z) + yT (Ax+Bz − c) + (ρ/2)‖Ax+Bz − c‖22

◮ ADMM:

xk+1 := argminx Lρ(x, zk, yk) // x-minimization

zk+1 := argminz Lρ(xk+1, z, yk) // z-minimization

yk+1 := yk + ρ(Axk+1 +Bzk+1 − c) // dual update

Alternating direction method of multipliers 14

Page 15: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Alternating direction method of multipliers

◮ if we minimized over x and z jointly, reduces to method of multipliers

◮ instead, we do one pass of a Gauss-Seidel method

◮ we get splitting since we minimize over x with z fixed, and vice versa

Alternating direction method of multipliers 15

Page 16: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

ADMM and optimality conditions

◮ optimality conditions (for differentiable case):

– primal feasibility: Ax+Bz − c = 0– dual feasibility: ∇f(x) +AT y = 0, ∇g(z) +BT y = 0

◮ since zk+1 minimizes Lρ(xk+1, z, yk) we have

0 = ∇g(zk+1) +BT yk + ρBT (Axk+1 +Bzk+1 − c)

= ∇g(zk+1) +BT yk+1

◮ so with ADMM dual variable update, (xk+1, zk+1, yk+1) satisfies seconddual feasibility condition

◮ primal and first dual feasibility are achieved as k → ∞

Alternating direction method of multipliers 16

Page 17: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

ADMM with scaled dual variables

◮ combine linear and quadratic terms in augmented Lagrangian

Lρ(x, z, y) = f(x) + g(z) + yT (Ax+Bz − c) + (ρ/2)‖Ax+Bz − c‖22= f(x) + g(z) + (ρ/2)‖Ax+Bz − c+ u‖22 + const.

with uk = (1/ρ)yk

◮ ADMM (scaled dual form):

xk+1 := argminx

(

f(x) + (ρ/2)‖Ax+Bzk − c+ uk‖22)

zk+1 := argminz

(

g(z) + (ρ/2)‖Axk+1 +Bz − c+ uk‖22)

uk+1 := uk + (Axk+1 +Bzk+1 − c)

Alternating direction method of multipliers 17

Page 18: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Convergence

◮ assume (very little!)

– f , g convex, closed, proper– L0 has a saddle point

◮ then ADMM converges:

– iterates approach feasibility: Axk +Bzk − c → 0– objective approaches optimal value: f(xk) + g(zk) → p⋆

Alternating direction method of multipliers 18

Page 19: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Related algorithms

◮ operator splitting methods(Douglas, Peaceman, Rachford, Lions, Mercier, . . . 1950s, 1979)

◮ proximal point algorithm (Rockafellar 1976)

◮ Dykstra’s alternating projections algorithm (1983)

◮ Spingarn’s method of partial inverses (1985)

◮ Rockafellar-Wets progressive hedging (1991)

◮ proximal methods (Rockafellar, many others, 1976–present)

◮ Bregman iterative methods (2008–present)

◮ most of these are special cases of the proximal point algorithm

Alternating direction method of multipliers 19

Page 20: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Outline

Dual decomposition

Method of multipliers

Alternating direction method of multipliers

Common patterns

Examples

Consensus and exchange

Conclusions

Common patterns 20

Page 21: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Common patterns

◮ x-update step requires minimizing f(x) + (ρ/2)‖Ax− v‖22(with v = Bzk − c+ uk, which is constant during x-update)

◮ similar for z-update

◮ several special cases come up often

◮ can simplify update by exploit structure in these cases

Common patterns 21

Page 22: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Decomposition

◮ suppose f is block-separable,

f(x) = f1(x1) + · · ·+ fN (xN ), x = (x1, . . . , xN )

◮ A is conformably block separable: ATA is block diagonal

◮ then x-update splits into N parallel updates of xi

Common patterns 22

Page 23: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Proximal operator

◮ consider x-update when A = I

x+ = argminx

(

f(x) + (ρ/2)‖x− v‖22)

= proxf,ρ(v)

◮ some special cases:

f = IC (indicator fct. of set C) x+ := ΠC(v) (projection onto C)

f = λ‖ · ‖1 (ℓ1 norm) x+

i := Sλ/ρ(vi) (soft thresholding)

(Sa(v) = (v − a)+ − (−v − a)+)

Common patterns 23

Page 24: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Quadratic objective

◮ f(x) = (1/2)xTPx+ qTx+ r

◮ x+ := (P + ρATA)−1(ρAT v − q)

◮ use matrix inversion lemma when computationally advantageous

(P + ρATA)−1 = P−1 − ρP−1AT (I + ρAP−1AT )−1AP−1

◮ (direct method) cache factorization of P + ρATA (or I + ρAP−1AT )

◮ (iterative method) warm start, early stopping, reducing tolerances

Common patterns 24

Page 25: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Smooth objective

◮ f smooth

◮ can use standard methods for smooth minimization

– gradient, Newton, or quasi-Newton– preconditionned CG, limited-memory BFGS (scale to very large problems)

◮ can exploit

– warm start– early stopping, with tolerances decreasing as ADMM proceeds

Common patterns 25

Page 26: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Outline

Dual decomposition

Method of multipliers

Alternating direction method of multipliers

Common patterns

Examples

Consensus and exchange

Conclusions

Examples 26

Page 27: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Constrained convex optimization

◮ consider ADMM for generic problem

minimize f(x)subject to x ∈ C

◮ ADMM form: take g to be indicator of C

minimize f(x) + g(z)subject to x− z = 0

◮ algorithm:

xk+1 := argminx

(

f(x) + (ρ/2)‖x− zk + uk‖22)

zk+1 := ΠC(xk+1 + uk)

uk+1 := uk + xk+1 − zk+1

Examples 27

Page 28: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Lasso

◮ lasso problem:

minimize (1/2)‖Ax− b‖22 + λ‖x‖1

◮ ADMM form:

minimize (1/2)‖Ax− b‖22 + λ‖z‖1subject to x− z = 0

◮ ADMM:

xk+1 := (ATA+ ρI)−1(AT b+ ρzk − yk)

zk+1 := Sλ/ρ(xk+1 + yk/ρ)

yk+1 := yk + ρ(xk+1 − zk+1)

Examples 28

Page 29: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Lasso example

◮ example with dense A ∈ R1500×5000

(1500 measurements; 5000 regressors)

◮ computation times

factorization (same as ridge regression) 1.3s

subsequent ADMM iterations 0.03s

lasso solve (about 50 ADMM iterations) 2.9s

full regularization path (30 λ’s) 4.4s

◮ not bad for a very short Matlab script

Examples 29

Page 30: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Sparse inverse covariance selection

◮ S: empirical covariance of samples from N (0,Σ), with Σ−1 sparse(i.e., Gaussian Markov random field)

◮ estimate Σ−1 via ℓ1 regularized maximum likelihood

minimize Tr(SX)− log detX + λ‖X‖1

◮ methods: COVSEL (Banerjee et al 2008), graphical lasso (FHT 2008)

Examples 30

Page 31: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Sparse inverse covariance selection via ADMM

◮ ADMM form:

minimize Tr(SX)− log detX + λ‖Z‖1subject to X − Z = 0

◮ ADMM:

Xk+1 := argminX

(

Tr(SX)− log detX + (ρ/2)‖X − Zk + Uk‖2F)

Zk+1 := Sλ/ρ(Xk+1 + Uk)

Uk+1 := Uk + (Xk+1 − Zk+1)

Examples 31

Page 32: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Analytical solution for X-update

◮ compute eigendecomposition ρ(Zk − Uk)− S = QΛQT

◮ form diagonal matrix X with

Xii =λi +

λ2i + 4ρ

◮ let Xk+1 := QXQT

◮ cost of X-update is an eigendecomposition

Examples 32

Page 33: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Sparse inverse covariance selection example

◮ Σ−1 is 1000× 1000 with 104 nonzeros

– graphical lasso (Fortran): 20 seconds – 3 minutes– ADMM (Matlab): 3 – 10 minutes– (depends on choice of λ)

◮ very rough experiment, but with no special tuning, ADMM is in ballparkof recent specialized methods

◮ (for comparison, COVSEL takes 25+ min when Σ−1 is a 400× 400tridiagonal matrix)

Examples 33

Page 34: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Outline

Dual decomposition

Method of multipliers

Alternating direction method of multipliers

Common patterns

Examples

Consensus and exchange

Conclusions

Consensus and exchange 34

Page 35: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Consensus optimization

◮ want to solve problem with N objective terms

minimize∑N

i=1fi(x)

– e.g., fi is the loss function for ith block of training data

◮ ADMM form:minimize

∑Ni=1

fi(xi)subject to xi − z = 0

– xi are local variables

– z is the global variable

– xi − z = 0 are consistency or consensus constraints– can add regularization using a g(z) term

Consensus and exchange 35

Page 36: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Consensus optimization via ADMM

◮ Lρ(x, z, y) =∑N

i=1

(

fi(xi) + yTi (xi − z) + (ρ/2)‖xi − z‖22)

◮ ADMM:

xk+1

i := argminxi

(

fi(xi) + ykTi (xi − zk) + (ρ/2)‖xi − zk‖22)

zk+1 :=1

N

N∑

i=1

(

xk+1

i + (1/ρ)yki)

yk+1

i := yki + ρ(xk+1

i − zk+1)

◮ with regularization, averaging in z update is followed by proxg,ρ

Consensus and exchange 36

Page 37: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Consensus optimization via ADMM

◮ using∑N

i=1yki = 0, algorithm simplifies to

xk+1

i := argminxi

(

fi(xi) + ykTi (xi − xk) + (ρ/2)‖xi − xk‖22)

yk+1

i := yki + ρ(xk+1

i − xk+1)

where xk = (1/N)∑N

i=1xki

◮ in each iteration

– gather xk

i and average to get xk

– scatter the average xk to processors– update yk

i locally (in each processor, in parallel)– update xi locally

Consensus and exchange 37

Page 38: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Statistical interpretation

◮ fi is negative log-likelihood for parameter x given ith data block

◮ xk+1

i is MAP estimate under prior N (xk + (1/ρ)yki , ρI)

◮ prior mean is previous iteration’s consensus shifted by ‘price’ of processori disagreeing with previous consensus

◮ processors only need to support a Gaussian MAP method

– type or number of data in each block not relevant– consensus protocol yields global maximum-likelihood estimate

Consensus and exchange 38

Page 39: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Consensus classification

◮ data (examples) (ai, bi), i = 1, . . . , N , ai ∈ Rn, bi ∈ {−1,+1}

◮ linear classifier sign(aTw + v), with weight w, offset v

◮ margin for ith example is bi(aTi w + v); want margin to be positive

◮ loss for ith example is l(bi(aTi w + v))

– l is loss function (hinge, logistic, probit, exponential, . . . )

◮ choose w, v to minimize 1

N

∑Ni=1

l(bi(aTi w + v)) + r(w)

– r(w) is regularization term (ℓ2, ℓ1, . . . )

◮ split data and use ADMM consensus to solve

Consensus and exchange 39

Page 40: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Consensus SVM example

◮ hinge loss l(u) = (1− u)+ with ℓ2 regularization

◮ baby problem with n = 2, N = 400 to illustrate

◮ examples split into 20 groups, in worst possible way:each group contains only positive or negative examples

Consensus and exchange 40

Page 41: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Iteration 1

−3 −2 −1 0 1 2 3−10

−8

−6

−4

−2

0

2

4

6

8

10

Consensus and exchange 41

Page 42: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Iteration 5

−3 −2 −1 0 1 2 3−10

−8

−6

−4

−2

0

2

4

6

8

10

Consensus and exchange 42

Page 43: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Iteration 40

−3 −2 −1 0 1 2 3−10

−8

−6

−4

−2

0

2

4

6

8

10

Consensus and exchange 43

Page 44: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Distributed lasso example

◮ example with dense A ∈ R400000×8000 (roughly 30 GB of data)

– distributed solver written in C using MPI and GSL– no optimization or tuned libraries (like ATLAS, MKL)– split into 80 subsystems across 10 (8-core) machines on Amazon EC2

◮ computation times

loading data 30s

factorization 5m

subsequent ADMM iterations 0.5–2s

lasso solve (about 15 ADMM iterations) 5–6m

Consensus and exchange 44

Page 45: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Exchange problem

minimize∑N

i=1fi(xi)

subject to∑N

i=1xi = 0

◮ another canonical problem, like consensus

◮ in fact, it’s the dual of consensus

◮ can interpret as N agents exchanging n goods to minimize a total cost

◮ (xi)j ≥ 0 means agent i receives (xi)j of good j from exchange

◮ (xi)j < 0 means agent i contributes |(xi)j | of good j to exchange

◮ constraint∑N

i=1xi = 0 is equilibrium or market clearing constraint

◮ optimal dual variable y⋆ is a set of valid prices for the goods

◮ suggests real or virtual cash payment (y⋆)Txi by agent i

Consensus and exchange 45

Page 46: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Exchange ADMM

◮ solve as a generic constrained convex problem with constraint set

C = {x ∈ RnN | x1 + x2 + · · ·+ xN = 0}

◮ scaled form:

xk+1

i := argminxi

(

fi(xi) + (ρ/2)‖xi − xki + xk + uk‖22

)

uk+1 := uk + xk+1

◮ unscaled form:

xk+1

i := argminxi

(

fi(xi) + ykTxi + (ρ/2)‖xi − (xki − xk)‖22

)

yk+1 := yk + ρxk+1

Consensus and exchange 46

Page 47: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Interpretation as tatonnement process

◮ tatonnement process: iteratively update prices to clear market

◮ work towards equilibrium by increasing/decreasing prices of goods basedon excess demand/supply

◮ dual decomposition is the simplest tatonnement algorithm

◮ ADMM adds proximal regularization

– incorporate agents’ prior commitment to help clear market– convergence far more robust convergence than dual decomposition

Consensus and exchange 47

Page 48: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Distributed dynamic energy management

◮ N devices exchange power in time periods t = 1, . . . , T

◮ xi ∈ RT is power flow profile for device i

◮ fi(xi) is cost of profile xi (and encodes constraints)

◮ x1 + · · ·+ xN = 0 is energy balance (in each time period)

◮ dynamic energy management problem is exchange problem

◮ exchange ADMM gives distributed method for dynamic energymanagement

◮ each device optimizes its own profile, with quadratic regularization forcoordination

◮ residual (energy imbalance) is driven to zero

Consensus and exchange 48

Page 49: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Generators

−xt0 2 4 6 8 10012345678910

t0 5 10 15 20

0 5 10 15 20

0 5 10 15 20

00.51

1.5

24605

10

◮ 3 example generators

◮ left: generator costs/limits; right: ramp constraints

◮ can add cost for power changes

Consensus and exchange 49

Page 50: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Fixed loads

t0 5 10 15 200

5

10

15

◮ 2 example fixed loads

◮ cost is +∞ for not supplying load; zero otherwise

Consensus and exchange 50

Page 51: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Shiftable load

t0 5 10 15 20

0

0.5

1

1.5

2

◮ total energy consumed over an interval must exceed given minimum level

◮ limits on energy consumed in each period

◮ cost is +∞ for violating constraints; zero otherwise

Consensus and exchange 51

Page 52: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Battery energy storage system

t0 5 10 15 20

0 5 10 15 20

−0.4−0.2

00.20.40.6

0

2

4

6

◮ energy store with maximum capacity, charge/discharge limits◮ black: battery charge, red: charge/discharge profile◮ cost is +∞ for violating constraints; zero otherwise

Consensus and exchange 52

Page 53: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Electric vehicle charging system

t0 5 10 15 200

10

20

30

40

50

60

70

80

◮ black: desired charge profile; blue: charge profile

◮ shortfall cost for not meeting desired charge

Consensus and exchange 53

Page 54: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

HVAC

t0 5 10 15 20

0 5 10 15 20

00.51

1.52

5060708090

100

◮ thermal load (e.g., room, refrigerator) with temperature limits◮ magenta: ambient temperature; blue: load temperature◮ red: cooling energy profile◮ cost is +∞ for violating constraints; zero otherwise

Consensus and exchange 54

Page 55: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

External tie

t0 5 10 15 20

0 5 10 15 20

−1

0

1

00.51

1.52

2.5

◮ buy/sell energy from/to external grid at price pext(t)± γ(t)

◮ solid: pext(t); dashed: pext(t)± γ(t)

Consensus and exchange 55

Page 56: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Smart grid example

10 devices (already described above)

◮ 3 generators

◮ 2 fixed loads

◮ 1 shiftable load

◮ 1 EV charging systems

◮ 1 battery

◮ 1 HVAC system

◮ 1 external tie

Consensus and exchange 56

Page 57: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Convergence

iteration: k = 1

t0 5 10 15 20

0 5 10 15 20

0 5 10 15 20

00.51

1.5

246

05

10

t0 5 10 15 20

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

◮ left: solid: optimal generator profile, dashed: profile at kth iteration

◮ right: residual vector xk

Consensus and exchange 57

Page 58: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Convergence

iteration: k = 3

t0 5 10 15 20

0 5 10 15 20

0 5 10 15 20

00.51

1.5

246

05

10

t0 5 10 15 20

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

◮ left: solid: optimal generator profile, dashed: profile at kth iteration

◮ right: residual vector xk

Consensus and exchange 57

Page 59: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Convergence

iteration: k = 5

t0 5 10 15 20

0 5 10 15 20

0 5 10 15 20

00.51

1.5

246

05

10

t0 5 10 15 20

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

◮ left: solid: optimal generator profile, dashed: profile at kth iteration

◮ right: residual vector xk

Consensus and exchange 57

Page 60: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Convergence

iteration: k = 10

t0 5 10 15 20

0 5 10 15 20

0 5 10 15 20

00.51

1.5

246

05

10

t0 5 10 15 20

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

◮ left: solid: optimal generator profile, dashed: profile at kth iteration

◮ right: residual vector xk

Consensus and exchange 57

Page 61: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Convergence

iteration: k = 15

t0 5 10 15 20

0 5 10 15 20

0 5 10 15 20

00.51

1.5

246

05

10

t0 5 10 15 20

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

◮ left: solid: optimal generator profile, dashed: profile at kth iteration

◮ right: residual vector xk

Consensus and exchange 57

Page 62: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Convergence

iteration: k = 20

t0 5 10 15 20

0 5 10 15 20

0 5 10 15 20

00.51

1.5

246

05

10

t0 5 10 15 20

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

◮ left: solid: optimal generator profile, dashed: profile at kth iteration

◮ right: residual vector xk

Consensus and exchange 57

Page 63: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Convergence

iteration: k = 25

t0 5 10 15 20

0 5 10 15 20

0 5 10 15 20

00.51

1.5

246

05

10

t0 5 10 15 20

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

◮ left: solid: optimal generator profile, dashed: profile at kth iteration

◮ right: residual vector xk

Consensus and exchange 57

Page 64: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Convergence

iteration: k = 30

t0 5 10 15 20

0 5 10 15 20

0 5 10 15 20

00.51

1.5

246

05

10

t0 5 10 15 20

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

◮ left: solid: optimal generator profile, dashed: profile at kth iteration

◮ right: residual vector xk

Consensus and exchange 57

Page 65: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Convergence

iteration: k = 35

t0 5 10 15 20

0 5 10 15 20

0 5 10 15 20

00.51

1.5

246

05

10

t0 5 10 15 20

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

◮ left: solid: optimal generator profile, dashed: profile at kth iteration

◮ right: residual vector xk

Consensus and exchange 57

Page 66: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Convergence

iteration: k = 40

t0 5 10 15 20

0 5 10 15 20

0 5 10 15 20

00.51

1.5

246

05

10

t0 5 10 15 20

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

◮ left: solid: optimal generator profile, dashed: profile at kth iteration

◮ right: residual vector xk

Consensus and exchange 57

Page 67: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Convergence

iteration: k = 45

t0 5 10 15 20

0 5 10 15 20

0 5 10 15 20

00.51

1.5

246

05

10

t0 5 10 15 20

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

◮ left: solid: optimal generator profile, dashed: profile at kth iteration

◮ right: residual vector xk

Consensus and exchange 57

Page 68: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Convergence

iteration: k = 50

t0 5 10 15 20

0 5 10 15 20

0 5 10 15 20

00.51

1.5

246

05

10

t0 5 10 15 20

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

◮ left: solid: optimal generator profile, dashed: profile at kth iteration

◮ right: residual vector xk

Consensus and exchange 57

Page 69: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Outline

Dual decomposition

Method of multipliers

Alternating direction method of multipliers

Common patterns

Examples

Consensus and exchange

Conclusions

Conclusions 58

Page 70: Alternating Direction Method of Multipliers...Alternating Direction Method of Multipliers Prof S. Boyd EE364b,StanfordUniversity source: Distributed Optimization and Statistical Learning

Summary and conclusions

ADMM

◮ is the same as, or closely related to, many methods with other names

◮ has been around since the 1970s

◮ gives simple single-processor algorithms that can be competitive withstate-of-the-art

◮ can be used to coordinate many processors, each solving a substantialproblem, to solve a very large problem

Conclusions 59