ENE 2XX: Renewable Energy Systems and Control · Ex: Easily solve CPs with 1M’s of variables in tens of seconds CP solvers are off-the-shelf technology YOUR focus: Find ways to

ENE 2XX: Renewable Energy Systems and Control

LEC 02 : Convex Programs

Professor Scott MouraUniversity of California, Berkeley

Summer 2017

Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 1

What is an Optimization Program?

We seek “the best” values for design variables x ∈ Rn

Must respect certain constraints / limitations

minimize f(x) [Objective Function]

subject to gi(x) ≤ 0, i = 1, · · · ,m [Inequality constraints]

hj(x) = 0, j = 1, · · · l [Equality constraints]

A value x? that solves this optimization program is called a “minimizer”.






subject to g(x) ≤ 0 [Inequality constraints]

h(x) = 0 [Equality constraints]

Vector notation







subject to g(x) ≤ 0 [Inequality constraints]

h(x) = 0 [Equality constraints]

Vector notation



Classes of Optimization Programs

LPQP

CPNLPMIP

LP = Linear Program; QP = Quadratic Program; CP = Convex Program;NLP = Nonlinear Program; MIP = Mixed Integer Program


Outline

1 Convex Programming

2 Linear Programming

3 Quadratic Programming

4 Second Order Cone Programming

5 Robust Programming & Chance Constraints

6 Maximum Likelihood Estimation


Convex Programs

A convex optimization problem has the form

minimize f(x) (1)

subject to gi(x) ≤ 0, i = 1, · · · ,m (2)

aTj x = bj, j = 1, · · · , l. (3)

Comparing this problem with the abstract optimization problem definedbefore, the convex optimization problem has three additional requirements:

objective function f(x) must be convex,

inequality constraint functions gi(x) must be convex for all i = 1, · · · ,m,

the equality constraint functions hj(x) must be affine for all j = 1, · · · , l.

Note that in the convex optimization problem, we can only tolerate affineequality constraints, meaning (3) takes the matrix-vector form of Aeqx = beq.


Why care?

No general analytic solutions, however VERY powerful methods exist tosolve CPs numerically

Ex: Easily solve CPs with 100’s or 1000’s of variables in just a fewseconds

Ex: Easily solve CPs with 1M’s of variables in tens of seconds

CP solvers are off-the-shelf technology

YOUR focus: Find ways to convert your problem into a CP

If you formulate your problem into a CP, then you have essentiallysolved it

Converting your problem into a CP requires both art & technical skill


Key CP Properties

If a local minimum exists, then it is the global minimum.

If the objective function is strictly convex, and a local minimum exists,then it is a unique minimum.


Sub-classes of Convex Programs

Linear Programs (LPs)

Some Quadratic Programs (QPs)

Second Order Cone Programs (SOCPs)

Maximum Likelihood Estimation (MLE)

Geometric Programs (GPs)

Semidefinite Programs (SDPs)


Exercise

Is this a convex program?

minimize f(x) = x21 + x2

2 (4)

subject to g1(x) = x1/(1 + x22) ≤ 0 (5)

h1(x) = (x1 + x2)2 = 0 (6)


Exercise



2 (4)

subject to g1(x) = x1/(1 + x22) ≤ 0 (5)

h1(x) = (x1 + x2)2 = 0 (6)

NOT a convex program.

Inequality constraint function g1(x) is not convex in (x1, x2)

Equality constraint function h1(x) is not affine in (x1, x2)


Exercise



2 (4)

subject to g1(x) = x1/(1 + x22) ≤ 0 (5)

h1(x) = (x1 + x2)2 = 0 (6)

NOT a convex program.

Inequality constraint function g1(x) is not convex in (x1, x2)

Equality constraint function h1(x) is not affine in (x1, x2)

Now, an astute observer might comment that both sides of (5) can bemultiplied by (1 + x2

2) and (6) can be represented simply by x1 + x2 = 0,without loss of generality.


Exercise



2 (4)

subject to g1(x) = x1 ≤ 0 (5)

h1(x) = x1 + x2 = 0 (6)

YES. This is a convex program.

Objective function f(x) is convex in (x1, x2)

Inequality constraint function g1(x) is convex in (x1, x2)

Equality constraint function h1(x) is affine in (x1, x2)


Outline








Linear Programs

Linear program (LP) is defined as the following special case of a CP:

minimize cTx (7)

subject to Ax ≤ b (8)

Aeqx = beq (9)

f(x) must be linear (or affine, before dropping the additive constant)gi(x) and hj(x) must be affine for all i and j, respectively.

Figure: Feasible set of LPs always forms a polyhedron P. Objective functionvisualized as isolines of constant cost (dotted lines). The optimal solution is at theboundary point that touches the isoline of least cost.


Nature of LP Solutions

Proposition (Nature of LP Solutions)The solution to any linear program is characterized by one of the followingthree categories:

[No Solution] Occurs when feasible set is empty, or objective functionis unbounded.

[One Unique Solution] There exists a single unique solution at thevertex of the feasible set. That is, at least two constraints are activeand their intersection gives the optimal solution. (see previous slide)

[A Non-Unique Solution] There exists an infinite number of solutions,given by one edge of the feasible set. That is, one or more constraintsare active and all solutions along the intersection of these constraintsare equally optimal. This can only occur when the objective functiongradient is orthogonal to one or multiple constraint.


LP Examples

Diet Problem: choose quantities x1, · · · , xn of n foods

one unit of food j costs cj, contains amount aij of nutrient i

healthy diet requires nutrient i in quantity at least bi

to find cheapest health diet

minimize cTx

subject to Ax ≥ b, x ≥ 0

Minimize a piecewise affine (PWA) function:

minimize maxi=1,··· ,m

{aT

i x + bi

}is equivalent to the LP

minimize t

subject to aTi x + bi ≤ t, ∀ i = 1, · · · ,m


LP Examples

Diet Problem: choose quantities x1, · · · , xn of n foods

one unit of food j costs cj, contains amount aij of nutrient i

healthy diet requires nutrient i in quantity at least bi

to find cheapest health diet

minimize cTx

subject to Ax ≥ b, x ≥ 0

Minimize a piecewise affine (PWA) function:

minimize maxi=1,··· ,m

{aT

i x + bi

}is equivalent to the LP

minimize t

subject to aTi x + bi ≤ t, ∀ i = 1, · · · ,m


Optimal Economic Dispatch

You are the California Independent System Operator (CAISO). You mustschedule power generators for tomorrow (24 one-hour segments) to satisfyelectricity demand. Given data:

Generator i provides “marginal cost” ci (units of USD/MW). Quantity ci isfinancial compensation each generator requests for providing 1 MW.

Generator i has maximum power capacity of xi,max (units of MW).

California electricity demand is D(k), where k indexes each hour, i.e.k = 0,1, · · · ,23.

minimize23∑

k=0

n∑i=1

cixi(k) (10)

subject to 0 ≤ xi(k) ≤ xi,max, ∀ i = 1, · · · ,n, k = 0, · · · ,23 (11)n∑

i=1

xi(k) = D(k), k = 0, · · · ,23 (12)

optimization var xi(k) is power produced by generator i during hour k.Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 14


0 6 12 18 240

5

10

15

20

25

30

35

Time of Day

Pow

er

Dem

and [G

W]

Figure: [LEFT] Marginal cost of electricity for various generators, as a function of cumulativecapacity. The purple line indicates the total demand D(k). All generators left of the purple lineare dispatched. [RIGHT] Optimal supply mix and demand for 03:00.



0 6 12 18 240

5

10

15

20

25

30

35

Time of Day

Pow

er

Dem

and [G

W]

Figure: [LEFT] Marginal cost of electricity for various generators, as a function of cumulativecapacity. The purple line indicates the total demand D(k). All generators left of the purple lineare dispatched. [RIGHT] Optimal supply mix and demand for 19:00.


Outline








Quadratic Programs

Quadratic program (QP) is defined as:

minimize1

2xTQx + RTx + S (10)


Aeqx = beq (12)

f(x) must be quadratic in xgi(x) and hj(x) must be affine for all i and j, respectively.

Figure: Feasible set of QPs always forms a polyhedron P. Objective functionvisualized as convex quadratic iso-countours of constant cost (dotted lines).


Quadratic Programs

Quadratic program (QP) is defined as:

minimize1



Aeqx = beq (12)

f(x) must be quadratic in x

gi(x) and hj(x) must be affine for all i and j, respectively.

RemarkNot all QPs are convex programs! A QP is a convex program only if Q � 0,i.e. Q is positive semi-definite. QPs where Q � 0 are called non-convex QPsand are generally very hard to solve.


Linear Regression Modelsmore specifically, linear-in-the-parameters models

Suppose you have data comprised of n-data pairs (xi, yi), where i = 1, · · · ,n.You seek to fit a mathematical model to this data, of the form:

y = θ1x + θ0 (13)

How do we determine θ1, θ0?

Regression AnalysisEstablish a mathematical relationship between variables, given data.

Quoted Text Message from Tech IP Attorney to MeAttorney: One of our outside consultant firms billed us 170,000 USD to doan SQL regression modelMe: My undergrads would do that for 25 USD and pizza.Attorney: yeah, next time we should go that route


Graphical Version

Determine a “best fit” for m,b in the linear model

y = mx + b (14)

given n-data pairs (xi, yi), where i = 1, · · · ,n.In other words, find the line that best fits data points:


Least Squaresa.k.a. Ordinary Least Squares (OLS) or Linear Least Squares

Let us define best fit as follows. Define the “residual” ri for m,b and datapair (xi, yi) as follows:

ri = mxi + b− yi (15)

Obviously, when ri then m,b fit that data pair perfectly. We would like toselect m,b such that the sum of all residuals squared are minimized:

minm,b

i=n∑i=1

r2i = min

m,b

i=n∑i=1

(mxi + b− yi)2 (16)

= minθ=[m,b]

‖Xθ − Y‖22 (17)

where

θ =

[mb

], X =

x1 1x2 1...

...xn 1

, Y =

y1

y2

...yn

(18)


Graphical Result


Other Linear-in-the-Parameter Models

Polynomial: y = θ0 + θ1x + θ2x2 + · · ·+ θpxp. Residual r = Xθ − Y

θ =

θ0

θ1

...θp

, X =

1 x1 x2

1 · · · xp1

1 x2 x22 · · · xp

2...

......

1 xn x2n · · · xp

n

, Y =

y1

y2

...yn

(19)

Harmonic: y = θ1 sin(x) + θ2 cos(x) + θ3 sin(2x) + θ4 cos(2x).Residual r = Xθ − Y

θ =

θ1

θ2

θ3

θ4

, X =

sin(x1) cos(x1) sin(2x1) cos(2x1)sin(x2) cos(x2) sin(2x2) cos(2x2)

......

...sin(xn) cos(xn) sin(2xn) cos(2xn)

, Y =

y1

y2

...yn

(20)


Other Linear-in-the-Parameter Models

Radial Basis Function: y = θ1e−(x+0.5)2

+ θ2e−(x)2

+ θ3e−(x−0.5)2

.Residual r = Xθ − Y

θ =

θ1

θ2

θ3

, X =

e−(x1+0.5)2

e−(x1)2

e−(x1−0.5)2

e−(x2+0.5)2

e−(x2)2

e−(x2−0.5)2

......

...

e−(xn+0.5)2

e−(xn)2

e−(xn−0.5)2

, Y =

y1

y2

...yn

(21)

limited only by your imagination


Optimization Perspective

All regression problems for linear-in-the-parameters models can be written:

minimizeθ‖Xθ − Y‖22, X ∈ Rn×p, θ ∈ Rp, Y ∈ Rn (22)

n : number of data pairs (xi, yi)p : number of coefficients θ1, · · · , θp.We assume n > p.

Recall First Order Necessary Condition (FONC): If θ∗ minimizes (22),then d

dθ‖Xθ − Y‖22 = 0. Let’s expand this condition!

0 =d

dθ‖Xθ − Y‖22

=d

dθ(Xθ − Y)T(Xθ − Y)

=d

dθ

(θTXTXθ − 2YTXθ + YTY

)= 2XTXθ − 2XTY

⇒ θ∗ =(XTX

)−1XTY


Least Squares with L2 Regularizationa.k.a. Ridge Regression

What if we define “best fit” by a different criterion? For example, weminimize the sum of residuals squared, but penalize the coefficients fromgetting “too big”. Consider

minimizeθ ‖Xθ − Y‖22 + α‖θ‖22 (23)

Apply FONC:

0 =d

dθ‖Xθ − Y‖22 + α‖θ‖22

=d

dθ(Xθ − Y)T(Xθ − Y) + αθTθ

=d

dθ

(θT(XTX + αI)θ − 2YTXθ + YTY

)= 2(XTX + αI)θ − 2XTY

⇒ θ∗ =(XTX + αI

)−1XTY


Ridge Coefficients as you vary α


Least Squares with L1 Regularizationa.k.a. Lasso Regression

What if we define “best fit” by a different criterion? For example, supposeour data occasionally contains outliers that can bias our fitted linear modelundesirably. Is there a “robust regression” method? Yes.

minθ‖Xθ − Y‖22 + α‖θ‖1 (24)

L2 penalties place small weight on small coefficients

θ2i is very small when θi is small

Little incentive to drive θi to zero, unless you consider |θi| instead.

Note: Due to the 1-norm, this is no longer a QP! It is, however, a CP.


y = θ0 + θ1x + θ2x2 + · · ·+ θ8x8

0 0.2 0.4 0.6 0.8 12

2.5

3

3.5

SOC

Vo

ltag

e

Data

LSQ Fit

θ0 = 2.6460; θ1 = 5.5442; θ2 = −15.7690; θ3 = 16.4894; θ4 = −0.9965;θ5 = −4.2202; θ6 = −2.8927; θ7 = 0.0602; θ8 = 2.6326;


y = θ0 + θ1x + θ2x2 + · · ·+ θ8x8

0 0.2 0.4 0.6 0.8 12

2.5

3

3.5

SOC

Vo

ltag

e

DataL

1 Reg w/ α = 0.0042

θ0 = 2.9873; θ1 = 0.9366; θ2 = −0.5531; θ3 = −0.0641; θ4 = 0;θ5 = 0; θ6 = 0; θ7 = 0;θ8 = 0.1052


y = θ0 + θ1x + θ2x2 + · · ·+ θ8x8

0 0.2 0.4 0.6 0.8 12

2.5

3

3.5

SOC

Vo

ltag

e

DataL

1 Reg w/ α = 0.0083

θ0 = 3.0579; θ1 = 0.4773; θ2 = 0;θ3 = −0.1202; θ4 = 0;θ5 = 0; θ6 = 0; θ7 = 0; θ8 = 0


y = θ0 + θ1x + θ2x2 + · · ·+ θ8x8

0 0.2 0.4 0.6 0.8 12

2.5

3

3.5

SOC

Vo

ltag

e

DataL

1 Reg w/ α = 0.0125

θ0 = 3.0889; θ1 = 0.3547; θ2 = 0; θ3 = 0; θ4 = 0;θ5 = 0; θ6 = 0; θ7 = 0; θ8 = 0


y = θ0 + θ1x + θ2x2 + · · ·+ θ8x8

0 0.2 0.4 0.6 0.8 12

2.5

3

3.5

SOC

Vo

ltag

e

DataL

2 Reg w/ α = 0.000

L2 Reg w/ α = 0.001

L2 Reg w/ α = 0.010

θ0 = 2.1; θ1 = 28.8; θ2 = −301; θ3 = 1595; θ4 = −4743; θ5 = 8222; θ6 = −8238; θ7 = 4414; θ8 = −977

θ0 = 2.71; θ1 = 4.13; θ2 = −8.51; θ3 = 4.10; θ4 = 3.42; θ5 = −0.34; θ6 = −2.31; θ7 = −1.49; θ8 = 1.75

θ0 = 2.82; θ1 = 2.40; θ2 = −2.81; θ3 = −0.33; θ4 = 0.71; θ5 = 0.69; θ6 = 0.32; θ7 = −0.04; θ8 = −0.27Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 31

A Generalized Linear Model

The generalized linear model is given by:

y =

p∑i=1

θiφi(x) = θTφ(x) (25)

y ∈ R is the output of interest

θ ∈ Rp are the coefficients or parameters to fit

φ(x) are “regressors” or “predictors”, which can involve dependentdata in a nonlinear way

Summary of Regression Procedures

Least Squares (LSQ), a.k.a. linear least squares, ordinary least squares(convex QP)

LSQ w/ L2 Regularization, a.k.a. ridge regression (convex QP)

LSQ w/ L1 Regularization, a.k.a. lasso regression (CP)

LSQ w/ L1 and L2 Regularization, a.k.a. elastic net (CP)

LSQ w/ Huber Regularization (Hybridized L1+L2), a.k.a. Robust LSQ (CP)


Markowitz Portfolio Optimization - I

Problem Statement: Imagine you are an investment portfolio manager.You control a large sum of money, and can invest in n different assets. Atthe end of some time period, your investment produces a financial return.The key challenge, here, is the return is not easily predictable. It is random.

Notation:

xi denotes the percentage of fund to invest in asset i. Note that∑ni=1 xi = 1, and xi ≥ 0

Return is well characterized by Gaussian distribution N (µ,Σ), whereµ ∈ Rn is expected return and Σ ∈ Rn×n is the covariance

Examples:

Asset i has expected return µi =2%, with std dev of√

Σii =5%

Asset j has expected return µj =5%, with std dev of√

Σjj =50%


Markowitz Portfolio Optimization - II

Suppose we seek to

maximize expected return, AND

minimize risk

These two objectives cannot be achieved w/o tradeoffs. Therefore, oneoften “scalarizes” this bi-criterion problem to explore the tradeoff:

minimize −µTx + γ · xTΣx (26)

subject to 1Tx = 1, x � 0 (27)

where the parameter γ ≥ 0 is called the “risk aversion” parameter.

Increasing γ increases your sensitivity to risk

γ = 0 means you are risk neutral

γ < 0 means you are a risk seeker. Note this is NOT a convex QP.


Markowitz Portfolio Optimization - III

Consider this expected return & covariance data for a portfolio of 3 assets:

µ = [1.02, 1.05, 1.04]T , Σ =

(0.05)2 0 00 (0.5)2 00 0 (0.1)2

(28)

1.02 1.025 1.03 1.035 1.04 1.045 1.05Expected Return [%]

0

0.05

0.1

0.15

0.2

0.25

Ris

k

X: 1.046Y: 0.09999

. = 0

. = 1

Pareto Frontier

. = 0.05

Figure: Trade off between maximizingexpected return and minimizing risk. Thistrade off curve is called a “Pareto Frontier”

0 0.2 0.4 0.6 0.8 1Risk Aversion Param, .

0

0.2

0.4

0.6

0.8

1

Por

tfolio

Dis

trib

utio

n

x1

x2

x3

Figure: Optimal portfolio investmentstrategy, as risk aversion parameter γincreases.


Quadratically Constrained QPs

A generalization of the convex QP problem is the quadratically constrainedQP (QCQP):

minimize1


subject to1

2xTQix + RT

i x + Si ≤ 0, ∀ i = 1, · · · ,m (30)

Aeqx = beq (31)

where Q,Qi � 0 for the program to be convex.


Outline








Second Order Cone Programs

Second Order Cone Program (SOCP) is defined as:

minimize fTx (32)

subject to ‖Aix + bi‖2 ≤ cTi x + di, i = 1, · · · ,m (33)

Aeqx = beq (34)

Inequalities form a “second order cone” constraint

The unit second-order (convex)cone of dimension k is

Ck

={

[x; t] | x ∈ Rk−1, t ∈ R, ‖x‖ ≤ t}

which is also called the “Ice cream”cone or “Lorentz” cone.

Figure: Boundary of second-order conein R3, {(x1, x2, t)|(x2

1 + x22)

1/2 ≤ t}.


Convex QP→ SOCP - I

Consider the convex QP Problem:

minimize1


subject to Ax ≤ b, Aeqx = beq (36)

where Q ∈ Rn×n is symmetric and positive definite, R ∈ Rn, and S ∈ R.Note that ∥∥∥∥∥ 1√

2Q1/2x +

√2

2Q−1/2R

∥∥∥∥∥2

=1

2xTQx + RTx +

1

2RTQ−1R (37)

This allows us to re-write this convex QP as

minimize

∥∥∥∥∥ 1√2

Q1/2x +

√2

2Q−1/2R

∥∥∥∥∥2

− 1

2RTQ−1R + S (38)

subject to Ax ≤ b, Aeqx = beq (39)


Convex QP→ SOCP - II

which can be recast as a SOCP

minimize t (40)

subject to

∥∥∥∥∥ 1√2

Q1/2x +

√2

2Q−1/2R

∥∥∥∥∥ ≤ t (41)

Ax ≤ b, Aeqx = beq (42)

Note:

Original convex QP: n optimization variables; m + l constraints

SOCP reformulation: n + 1 optimization variables; m + l + 1 constraints

Remark:

Can extend to semi-definite Q

Can extend to QCQP problems


Robust Dispatch w/ High Penetration of Renewables

Problem Statement: Economically dispatch n generators to serveelectricity demand D. In this case, large percentage of n generators arerenewable.

Key Challenge: Maximum power generating capacity of renewable plantsis uncertain. For example,

Wind farm can produce anywhere between 0MW and 10MW

Solar PV farm can produce anywhere between 0MW and 20MW



Model as LP:

minimize fTx (43)

subject to RTx ≥ D (44)

0 ≤ x ≤ 1 (45)

x ∈ Rn is vector of power generation dispatched to the generators, as afraction of the generator’s rated capacity

f ∈ Rn is vector of marginal costs [USD/MW]

D ∈ R is electricity demand [MW]

R ∈ Rn is vector of real-time power capacity for generators [MW]

Convert into standard form:

minimize fTx (46)

subject to aTx ≤ b (47)

0 ≤ x ≤ 1 (48)

where a = −R and b = −DProf. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 42


Focus on uncertain parameter a. To illustrate, imagine...

generator i is wind farm, 0MW ≤ Ri ≤ 10MW; −10MW ≤ ai ≤ 0MW

generator j is solar PV farm, 0MW ≤ Rj ≤ 20MW; −20MW ≤ aj ≤ 0MW

generator k is natural gas plant, Rk = 50MW; ak = −50MW



Assumption: Vector a is known to lie within an ellipsoid

a ∈ E = {a + Pu | ‖u‖2 ≤ 1} (49)

where a ∈ Rn is ellipsoid center, and P ∈ Rn×n is positive semidefinite matrix

wind farm i: ai = −5 MW

solar PV farm j: aj = −10 MW

natural gas plant k: ak = −50 MW

Recall that λi(P) provides semi-axis lengths of ellipsoid.

Define P as diagonal matrix with:

wind farm i: Pii = 5 MW

solar PV farm j: Pjj = 10 MW

natural gas plant k: Pkk = 0 MW

Note: when P is positive definite, we have a 100% renewable grid!



Assumption: Vector a is known to lie within an ellipsoid

a ∈ E = {a + Pu | ‖u‖2 ≤ 1} (49)

where a ∈ Rn is ellipsoid center, and P ∈ Rn×n is positive semidefinite matrix

wind farm i: ai = −5 MW

solar PV farm j: aj = −10 MW

natural gas plant k: ak = −50 MW

Recall that λi(P) provides semi-axis lengths of ellipsoid.

Define P as diagonal matrix with:

wind farm i: Pii = 5 MW

solar PV farm j: Pjj = 10 MW

natural gas plant k: Pkk = 0 MW

Note: when P is positive definite, we have a 100% renewable grid!



Robust LP requires us to satisfy demand in EVERY instance of a ∈ E:

minimize fTx (50)

subject to aTx ≤ b, ∀ a ∈ E (51)

0 ≤ x ≤ 1 (52)

Note: We have n optimization vars and +∞ constraints... not cool.

Alternative idea: Solve under worst case scenario for a ∈ E . Convert (51)to

max{

aTx | a ∈ E}≤ b (53)

Re-write left hand side of (53) as

max{

aTx | a ∈ E}

= aTx + max{

uTPTx | ‖u‖2 ≤ 1}

(54)

= aTx + ‖PTx‖2 (55)

then the robust linear constraint can be re-expressed as

aTx + ‖PTx‖2 ≤ b (56)

which is a second order cone constraint




minimize fTx (50)


0 ≤ x ≤ 1 (52)



max{

aTx | a ∈ E}≤ b (53)


max{

aTx | a ∈ E}

= aTx + max{

uTPTx | ‖u‖2 ≤ 1}

(54)

= aTx + ‖PTx‖2 (55)


aTx + ‖PTx‖2 ≤ b (56)

which is a second order cone constraint




minimize fTx (50)


0 ≤ x ≤ 1 (52)



max{

aTx | a ∈ E}≤ b (53)


max{

aTx | a ∈ E}

= aTx + max{

uTPTx | ‖u‖2 ≤ 1}

(54)

= aTx + ‖PTx‖2 (55)


aTx + ‖PTx‖2 ≤ b (56)

which is a second order cone constraintProf. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 45


Robust LP→ SOCP – a sub-class of convex optimization problems:

minimize fTx (57)

subject to aTx + ‖PTx‖2 ≤ b (58)

0 ≤ x ≤ 1 (59)

Note: additional norm term acts as regularization term. Namely, it preventsx from being large in directions with considerable uncertainty.


Outline








A Stochastic Approach to Robust Programming

In previous example, we optimized w.r.t. worst case scenario. Some wouldargue this is too conservative. That is, we could allow constraint violationsin very low probability situations. This motivates “chance constraints”.

Recall the LP

minimize cTx (60)


Assume a ∈ Rn is Gaussian random vector, i.e. a ∼ N (a,Σ).Then aTx is a Gaussian random variable, with

mean aTx

variance xTΣx

Express probability that aTx ≤ b is satisfied as

Pr(aTx ≤ b) = Φ

(b− aTx

‖Σ1/2x‖2

)(62)

where Φ(x) = (1/√

2π)∫ x−∞ e−y2/2dy is the CDF of N (0,1).



In previous example, we optimized w.r.t. worst case scenario. Some wouldargue this is too conservative. That is, we could allow constraint violationsin very low probability situations. This motivates “chance constraints”.Recall the LP

minimize cTx (60)



mean aTx

variance xTΣx


Pr(aTx ≤ b) = Φ

(b− aTx

‖Σ1/2x‖2

)(62)





In previous example, we optimized w.r.t. worst case scenario. Some wouldargue this is too conservative. That is, we could allow constraint violationsin very low probability situations. This motivates “chance constraints”.Recall the LP

minimize cTx (60)



mean aTx

variance xTΣx


Pr(aTx ≤ b) = Φ

(b− aTx

‖Σ1/2x‖2

)(62)




Chance Constraints

This enables us to relax (61) into

minimize cTx (63)

subject to Pr(aTx ≤ b) ≥ η [“chance constraint”] (64)

In words, we require aTx ≤ b is satisfied with a reliability of η, where η istypically 0.9, 0.95, or 0.99

Interestingly, we can use (62) to convert this stochastic LP into an SOCP

Pr(aTx ≤ b) = Φ

(b− aTx

‖Σ1/2x‖2

)≥ η (65)

b− aTx

‖Σ1/2x‖2≥ Φ−1(η) (66)

Φ−1(η) · ‖Σ1/2x‖2 ≤ b− aTx (67)

where we recognize the final inequality as a second order cone constraint


Chance Constraints

This enables us to relax (61) into

minimize cTx (63)

subject to Pr(aTx ≤ b) ≥ η [“chance constraint”] (64)

In words, we require aTx ≤ b is satisfied with a reliability of η, where η istypically 0.9, 0.95, or 0.99

Interestingly, we can use (62) to convert this stochastic LP into an SOCP

Pr(aTx ≤ b) = Φ

(b− aTx

‖Σ1/2x‖2

)≥ η (65)

b− aTx

‖Σ1/2x‖2≥ Φ−1(η) (66)

Φ−1(η) · ‖Σ1/2x‖2 ≤ b− aTx (67)

where we recognize the final inequality as a second order cone constraint


Chance Constrained LP→ SOCP

thus, we converted chance constrained LP into the SOCP

minimize cTx (68)

subject to aTx + Φ−1(η)‖Σ1/2x‖2 ≤ b (69)

where Φ−1(·) is the inverse CDF

Note Φ−1(η) ≥ 0 to be valid second order cone constraint. True, if η ≥ 0.5. Inpractice, we almost always want reliability ≥ 0.5.

Comments:

approach extends to chanced constrained convex QPs

approach does NOT directly extend to non-Gaussian distributions

much more elegant & efficient than two-stage stochastic programming

Want to learn more about robust programming? ReadR. Bental, L. El Ghaoui, A. Nemirovski. Robust Optimization, PrincetonUniversity Press, 2009




minimize cTx (68)




Comments:








minimize cTx (68)




Comments:






Ex: Portfolio Optimization

Recall the Portfolio optimization problem:

x ∈ Rn indicates portfolio allocation; xi is fraction invested in asset i

x must satisfy 1Tx = 1, x ≥ 0

return (in percentage) is given by pTx, where p ∈ Rn and p ∼ N (p,Σ)

Objective: Maximize expected return, subject to limit on probability of loss

minimize −E[pTx

](70)

subject to Pr[pTx ≥ 1

]≥ η (71)

1Tx = 1, x � 0 (72)

can be recast into the SOCP

minimize −pTx (73)

subject to Φ−1(η) · ‖Σ1/2x‖2 ≤ pTx− 1 (74)

1Tx = 1, x � 0 (75)








minimize −E[pTx

](70)


]≥ η (71)

1Tx = 1, x � 0 (72)




1Tx = 1, x � 0 (75)








minimize −E[pTx

](70)


]≥ η (71)

1Tx = 1, x � 0 (72)




1Tx = 1, x � 0 (75)


Outline








Problem Setting

Problem Statement: You are provided m data points for random variabley. Goal: fit a probability distribution to this data.

Notation:

p(y; θ) is probability density function for y

free parameters θ ∈ Rn.

“likelihood function” is when we consider p(y; θ) as function of θ, forfixed value of y

“log-likelihood function” l(θ) = log p(y; θ)

Maximum Likelihood Estimation:

θ? = arg maxθ

l(θ) (76)

Remark: Interestingly, (76) is a convex optimization problem for manycommon scenarios.


MLE for Linear Models

Consider a linear measurement model:

yi = θTφi + vi, i = 1, · · · ,m (77)

θ ∈ Rn are parameters to be estimatedyi ∈ R are the measured data pointsφi ∈ Rn are the regressorsvi ∈ R are the measurement errors or noise. Assume vi’s are IID withprobability density p(·)

The likelihood function, given all the measured points yi and regressors φi isgiven by products of likelihood for each data point yi, φi:

p(v; θ) =m∏

i=1

p(yi − θTφi) (78)

The log-likelihood function is then

l(θ) = log p(v; θ) =m∑

i=1

log p(yi − θTφi) (79)

recall log(a · b) = log(a) + log(b)


MLE for Linear Models

Consider a linear measurement model:

yi = θTφi + vi, i = 1, · · · ,m (77)

θ ∈ Rn are parameters to be estimatedyi ∈ R are the measured data pointsφi ∈ Rn are the regressorsvi ∈ R are the measurement errors or noise. Assume vi’s are IID withprobability density p(·)

The likelihood function, given all the measured points yi and regressors φi isgiven by products of likelihood for each data point yi, φi:

p(v; θ) =m∏

i=1

p(yi − θTφi) (78)

The log-likelihood function is then

l(θ) = log p(v; θ) =m∑

i=1


recall log(a · b) = log(a) + log(b)Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 54

MLE for Linear Models & Gaussian Noise

The MLE problem is

maximizeθ

m∑i=1


the likelihood function is log-concave for several common distributions

Suppose that vi ∼ N (0, σ2). Thus p(v) = (2πσ2)−1/2 · e−v2/(2σ2). Then...

m∑i=1

log p(vi) =m∑

i=1

[−1

2log(2πσ2)− 1

2σ2· v2

i

](81)

= −m

2log(2πσ2)− 1

2σ2

m∑i=1

v2i (82)

m∑i=1

log p(yi − θTφi) = −m

2log(2πσ2)− 1

2σ2

m∑i=1

(yi − θTφi)2 (83)

= −m

2log(2πσ2)− 1

2σ2‖ΦTθ − Y‖22 (84)

where Φ = [φ1, · · · , φm] ∈ Rn×m, Y = [y1; · · · ; ym] ∈ Rm



The MLE problem is

maximizeθ

m∑i=1


the likelihood function is log-concave for several common distributions

Suppose that vi ∼ N (0, σ2). Thus p(v) = (2πσ2)−1/2 · e−v2/(2σ2). Then...

m∑i=1

log p(vi) =m∑

i=1

[−1

2log(2πσ2)− 1

2σ2· v2

i

](81)

= −m

2log(2πσ2)− 1

2σ2

m∑i=1

v2i (82)

m∑i=1

log p(yi − θTφi) = −m

2log(2πσ2)− 1

2σ2

m∑i=1

(yi − θTφi)2 (83)

= −m

2log(2πσ2)− 1

2σ2‖ΦTθ − Y‖22 (84)

where Φ = [φ1, · · · , φm] ∈ Rn×m, Y = [y1; · · · ; ym] ∈ Rm



The MLE problem with Gaussian noise is a least squares problem!

θ? = arg minθ‖ΦTθ − Y‖22 (85)

Exercise:Derive the MLE optimization formulation for (77) for the followingdistributions for vi:

1. Laplacian noise distribution: p(v) = 1/(2a) · e−|v|/a

2. Uniform noise distribution: p(v) = 1/(2a) on [−a,+a] and zeroelsewhere



The MLE problem with Gaussian noise is a least squares problem!


Exercise:Derive the MLE optimization formulation for (77) for the followingdistributions for vi:

1. Laplacian noise distribution: p(v) = 1/(2a) · e−|v|/a

2. Uniform noise distribution: p(v) = 1/(2a) on [−a,+a] and zeroelsewhere


MLE for Linear Models & Laplacian Noise

Laplace distribution

p(vi) =1

2a· e−

|v|a

Laplace distributions.Image Source: wikipedia.org


m∑i=1

log p(vi) =m∑

i=1

[log

(1

2a

)− 1

a· |vi|

]= m log

(1

2a

)− 1

a

m∑i=1

|vi|

(87)m∑

i=1

log p(yi − θTφi) = m log

(1

2a

)− 1

a

m∑i=1

|yi − θTφi| (88)

= m log

(1

2a

)− 1

a‖ΦTθ − Y‖1 (89)


Logistic Regressionwith application to discrete choice models

Consider binary random variable y ∈ {0,1}, with

Pr [y = 1] = p, Pr [y = 0] = 1− p (90)

for example

y = 1 corresponds to “charge EV at Nanshan iPark chg station”y = 0 corresponds to “DO NOT charge EV at Nanshan iPark chg station”

Hypothesize probability p is a function of the EV driver’s utility function, i.e.p(U). The utility function is:

U = aTφ+ b (91)

where a ∈ Rn,b ∈ R are free parameters; φ ∈ Rn are “explanatory” vars.Example explanatory vars:

day of week

time of day

charging price

is parking space openWe relate probability p to utility U using the logistic model:

p(U) =eU

1 + eU=

eaTφ+b

1 + eaTφ+b(92)



Consider binary random variable y ∈ {0,1}, with

Pr [y = 1] = p, Pr [y = 0] = 1− p (90)

for example

y = 1 corresponds to “charge EV at Nanshan iPark chg station”y = 0 corresponds to “DO NOT charge EV at Nanshan iPark chg station”

Hypothesize probability p is a function of the EV driver’s utility function, i.e.p(U). The utility function is:

U = aTφ+ b (91)

where a ∈ Rn,b ∈ R are free parameters; φ ∈ Rn are “explanatory” vars.Example explanatory vars:

day of week

time of day

charging price

is parking space openWe relate probability p to utility U using the logistic model:

p(U) =eU

1 + eU=

eaTφ+b

1 + eaTφ+b(92)



Logistic model:

p(U) =eU

1 + eU

Objective: Given explana-tory variables φ1, · · · , φm ∈ Rn

with corresponding outcomesy1, · · · , ym ∈ {0,1}, find freeparameters a,b via MLE

Re-order data

φ1, · · · , φq is for outcome y = 1

φq+1, · · · , φm is for outcome y = 0



Logistic model:

p(U) =eU

1 + eU

Objective: Given explana-tory variables φ1, · · · , φm ∈ Rn

with corresponding outcomesy1, · · · , ym ∈ {0,1}, find freeparameters a,b via MLE

Re-order data

φ1, · · · , φq is for outcome y = 1

φq+1, · · · , φm is for outcome y = 0


Likelihood function

p(y; a,b) =

q∏i=1

pi ·m∏

i=q+1

(1− pi) (93)

Log-likelihood function

l(a,b) =

q∑i=1

log pi +m∑

i=q+1

log(1− pi) (94)

=

q∑i=1

logeUi

1 + eUi+

m∑i=q+1

log1

1 + eUi(95)

=

q∑i=1

[Ui − log(1 + eUi )

]−

m∑i=q+1

log(1 + eUi ) (96)

=

q∑i=1

Ui −m∑

i=1

log(1 + eUi ) (97)

=

q∑i=1

[aTφi + b

]−

m∑i=1

log(

1 + eaTφi+b)

(98)

where this last expression is a concave function w.r.t. (a,b). Solve via CP.Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 60

Documents

ENE 2XX: Renewable Energy Systems and Control · Ex: Easily solve CPs with 1M’s of variables in tens of seconds CP solvers are off-the-shelf technology YOUR focus: Find ways to