Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
ENE 2XX: Renewable Energy Systems and Control
LEC 02 : Convex Programs
Professor Scott MouraUniversity of California, Berkeley
Summer 2017
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 1
What is an Optimization Program?
We seek “the best” values for design variables x ∈ Rn
Must respect certain constraints / limitations
minimize f(x) [Objective Function]
subject to gi(x) ≤ 0, i = 1, · · · ,m [Inequality constraints]
hj(x) = 0, j = 1, · · · l [Equality constraints]
A value x? that solves this optimization program is called a “minimizer”.
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 2
What is an Optimization Program?
We seek “the best” values for design variables x ∈ Rn
Must respect certain constraints / limitations
minimize f(x) [Objective Function]
subject to g(x) ≤ 0 [Inequality constraints]
h(x) = 0 [Equality constraints]
Vector notation
A value x? that solves this optimization program is called a “minimizer”.
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 2
What is an Optimization Program?
We seek “the best” values for design variables x ∈ Rn
Must respect certain constraints / limitations
minimize f(x) [Objective Function]
subject to g(x) ≤ 0 [Inequality constraints]
h(x) = 0 [Equality constraints]
Vector notation
A value x? that solves this optimization program is called a “minimizer”.
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 2
Classes of Optimization Programs
LPQP
CPNLPMIP
LP = Linear Program; QP = Quadratic Program; CP = Convex Program;NLP = Nonlinear Program; MIP = Mixed Integer Program
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 3
Outline
1 Convex Programming
2 Linear Programming
3 Quadratic Programming
4 Second Order Cone Programming
5 Robust Programming & Chance Constraints
6 Maximum Likelihood Estimation
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 4
Convex Programs
A convex optimization problem has the form
minimize f(x) (1)
subject to gi(x) ≤ 0, i = 1, · · · ,m (2)
aTj x = bj, j = 1, · · · , l. (3)
Comparing this problem with the abstract optimization problem definedbefore, the convex optimization problem has three additional requirements:
objective function f(x) must be convex,
inequality constraint functions gi(x) must be convex for all i = 1, · · · ,m,
the equality constraint functions hj(x) must be affine for all j = 1, · · · , l.
Note that in the convex optimization problem, we can only tolerate affineequality constraints, meaning (3) takes the matrix-vector form of Aeqx = beq.
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 5
Why care?
No general analytic solutions, however VERY powerful methods exist tosolve CPs numerically
Ex: Easily solve CPs with 100’s or 1000’s of variables in just a fewseconds
Ex: Easily solve CPs with 1M’s of variables in tens of seconds
CP solvers are off-the-shelf technology
YOUR focus: Find ways to convert your problem into a CP
If you formulate your problem into a CP, then you have essentiallysolved it
Converting your problem into a CP requires both art & technical skill
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 6
Key CP Properties
If a local minimum exists, then it is the global minimum.
If the objective function is strictly convex, and a local minimum exists,then it is a unique minimum.
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 7
Sub-classes of Convex Programs
Linear Programs (LPs)
Some Quadratic Programs (QPs)
Second Order Cone Programs (SOCPs)
Maximum Likelihood Estimation (MLE)
Geometric Programs (GPs)
Semidefinite Programs (SDPs)
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 8
Exercise
Is this a convex program?
minimize f(x) = x21 + x2
2 (4)
subject to g1(x) = x1/(1 + x22) ≤ 0 (5)
h1(x) = (x1 + x2)2 = 0 (6)
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 9
Exercise
Is this a convex program?
minimize f(x) = x21 + x2
2 (4)
subject to g1(x) = x1/(1 + x22) ≤ 0 (5)
h1(x) = (x1 + x2)2 = 0 (6)
NOT a convex program.
Inequality constraint function g1(x) is not convex in (x1, x2)
Equality constraint function h1(x) is not affine in (x1, x2)
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 9
Exercise
Is this a convex program?
minimize f(x) = x21 + x2
2 (4)
subject to g1(x) = x1/(1 + x22) ≤ 0 (5)
h1(x) = (x1 + x2)2 = 0 (6)
NOT a convex program.
Inequality constraint function g1(x) is not convex in (x1, x2)
Equality constraint function h1(x) is not affine in (x1, x2)
Now, an astute observer might comment that both sides of (5) can bemultiplied by (1 + x2
2) and (6) can be represented simply by x1 + x2 = 0,without loss of generality.
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 9
Exercise
Is this a convex program?
minimize f(x) = x21 + x2
2 (4)
subject to g1(x) = x1 ≤ 0 (5)
h1(x) = x1 + x2 = 0 (6)
YES. This is a convex program.
Objective function f(x) is convex in (x1, x2)
Inequality constraint function g1(x) is convex in (x1, x2)
Equality constraint function h1(x) is affine in (x1, x2)
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 9
Outline
1 Convex Programming
2 Linear Programming
3 Quadratic Programming
4 Second Order Cone Programming
5 Robust Programming & Chance Constraints
6 Maximum Likelihood Estimation
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 10
Linear Programs
Linear program (LP) is defined as the following special case of a CP:
minimize cTx (7)
subject to Ax ≤ b (8)
Aeqx = beq (9)
f(x) must be linear (or affine, before dropping the additive constant)gi(x) and hj(x) must be affine for all i and j, respectively.
Figure: Feasible set of LPs always forms a polyhedron P. Objective functionvisualized as isolines of constant cost (dotted lines). The optimal solution is at theboundary point that touches the isoline of least cost.
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 11
Nature of LP Solutions
Proposition (Nature of LP Solutions)The solution to any linear program is characterized by one of the followingthree categories:
[No Solution] Occurs when feasible set is empty, or objective functionis unbounded.
[One Unique Solution] There exists a single unique solution at thevertex of the feasible set. That is, at least two constraints are activeand their intersection gives the optimal solution. (see previous slide)
[A Non-Unique Solution] There exists an infinite number of solutions,given by one edge of the feasible set. That is, one or more constraintsare active and all solutions along the intersection of these constraintsare equally optimal. This can only occur when the objective functiongradient is orthogonal to one or multiple constraint.
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 12
LP Examples
Diet Problem: choose quantities x1, · · · , xn of n foods
one unit of food j costs cj, contains amount aij of nutrient i
healthy diet requires nutrient i in quantity at least bi
to find cheapest health diet
minimize cTx
subject to Ax ≥ b, x ≥ 0
Minimize a piecewise affine (PWA) function:
minimize maxi=1,··· ,m
{aT
i x + bi
}is equivalent to the LP
minimize t
subject to aTi x + bi ≤ t, ∀ i = 1, · · · ,m
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 13
LP Examples
Diet Problem: choose quantities x1, · · · , xn of n foods
one unit of food j costs cj, contains amount aij of nutrient i
healthy diet requires nutrient i in quantity at least bi
to find cheapest health diet
minimize cTx
subject to Ax ≥ b, x ≥ 0
Minimize a piecewise affine (PWA) function:
minimize maxi=1,··· ,m
{aT
i x + bi
}is equivalent to the LP
minimize t
subject to aTi x + bi ≤ t, ∀ i = 1, · · · ,m
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 13
Optimal Economic Dispatch
You are the California Independent System Operator (CAISO). You mustschedule power generators for tomorrow (24 one-hour segments) to satisfyelectricity demand. Given data:
Generator i provides “marginal cost” ci (units of USD/MW). Quantity ci isfinancial compensation each generator requests for providing 1 MW.
Generator i has maximum power capacity of xi,max (units of MW).
California electricity demand is D(k), where k indexes each hour, i.e.k = 0,1, · · · ,23.
minimize23∑
k=0
n∑i=1
cixi(k) (10)
subject to 0 ≤ xi(k) ≤ xi,max, ∀ i = 1, · · · ,n, k = 0, · · · ,23 (11)n∑
i=1
xi(k) = D(k), k = 0, · · · ,23 (12)
optimization var xi(k) is power produced by generator i during hour k.Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 14
Optimal Economic Dispatch
0 6 12 18 240
5
10
15
20
25
30
35
Time of Day
Pow
er
Dem
and [G
W]
Figure: [LEFT] Marginal cost of electricity for various generators, as a function of cumulativecapacity. The purple line indicates the total demand D(k). All generators left of the purple lineare dispatched. [RIGHT] Optimal supply mix and demand for 03:00.
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 14
Optimal Economic Dispatch
0 6 12 18 240
5
10
15
20
25
30
35
Time of Day
Pow
er
Dem
and [G
W]
Figure: [LEFT] Marginal cost of electricity for various generators, as a function of cumulativecapacity. The purple line indicates the total demand D(k). All generators left of the purple lineare dispatched. [RIGHT] Optimal supply mix and demand for 19:00.
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 14
Outline
1 Convex Programming
2 Linear Programming
3 Quadratic Programming
4 Second Order Cone Programming
5 Robust Programming & Chance Constraints
6 Maximum Likelihood Estimation
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 15
Quadratic Programs
Quadratic program (QP) is defined as:
minimize1
2xTQx + RTx + S (10)
subject to Ax ≤ b (11)
Aeqx = beq (12)
f(x) must be quadratic in xgi(x) and hj(x) must be affine for all i and j, respectively.
Figure: Feasible set of QPs always forms a polyhedron P. Objective functionvisualized as convex quadratic iso-countours of constant cost (dotted lines).
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 16
Quadratic Programs
Quadratic program (QP) is defined as:
minimize1
2xTQx + RTx + S (10)
subject to Ax ≤ b (11)
Aeqx = beq (12)
f(x) must be quadratic in x
gi(x) and hj(x) must be affine for all i and j, respectively.
RemarkNot all QPs are convex programs! A QP is a convex program only if Q � 0,i.e. Q is positive semi-definite. QPs where Q � 0 are called non-convex QPsand are generally very hard to solve.
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 16
Linear Regression Modelsmore specifically, linear-in-the-parameters models
Suppose you have data comprised of n-data pairs (xi, yi), where i = 1, · · · ,n.You seek to fit a mathematical model to this data, of the form:
y = θ1x + θ0 (13)
How do we determine θ1, θ0?
Regression AnalysisEstablish a mathematical relationship between variables, given data.
Quoted Text Message from Tech IP Attorney to MeAttorney: One of our outside consultant firms billed us 170,000 USD to doan SQL regression modelMe: My undergrads would do that for 25 USD and pizza.Attorney: yeah, next time we should go that route
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 17
Graphical Version
Determine a “best fit” for m,b in the linear model
y = mx + b (14)
given n-data pairs (xi, yi), where i = 1, · · · ,n.In other words, find the line that best fits data points:
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 18
Least Squaresa.k.a. Ordinary Least Squares (OLS) or Linear Least Squares
Let us define best fit as follows. Define the “residual” ri for m,b and datapair (xi, yi) as follows:
ri = mxi + b− yi (15)
Obviously, when ri then m,b fit that data pair perfectly. We would like toselect m,b such that the sum of all residuals squared are minimized:
minm,b
i=n∑i=1
r2i = min
m,b
i=n∑i=1
(mxi + b− yi)2 (16)
= minθ=[m,b]
‖Xθ − Y‖22 (17)
where
θ =
[mb
], X =
x1 1x2 1...
...xn 1
, Y =
y1
y2
...yn
(18)
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 19
Graphical Result
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 20
Other Linear-in-the-Parameter Models
Polynomial: y = θ0 + θ1x + θ2x2 + · · ·+ θpxp. Residual r = Xθ − Y
θ =
θ0
θ1
...θp
, X =
1 x1 x2
1 · · · xp1
1 x2 x22 · · · xp
2...
......
1 xn x2n · · · xp
n
, Y =
y1
y2
...yn
(19)
Harmonic: y = θ1 sin(x) + θ2 cos(x) + θ3 sin(2x) + θ4 cos(2x).Residual r = Xθ − Y
θ =
θ1
θ2
θ3
θ4
, X =
sin(x1) cos(x1) sin(2x1) cos(2x1)sin(x2) cos(x2) sin(2x2) cos(2x2)
......
...sin(xn) cos(xn) sin(2xn) cos(2xn)
, Y =
y1
y2
...yn
(20)
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 21
Other Linear-in-the-Parameter Models
Radial Basis Function: y = θ1e−(x+0.5)2
+ θ2e−(x)2
+ θ3e−(x−0.5)2
.Residual r = Xθ − Y
θ =
θ1
θ2
θ3
, X =
e−(x1+0.5)2
e−(x1)2
e−(x1−0.5)2
e−(x2+0.5)2
e−(x2)2
e−(x2−0.5)2
......
...
e−(xn+0.5)2
e−(xn)2
e−(xn−0.5)2
, Y =
y1
y2
...yn
(21)
limited only by your imagination
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 22
Optimization Perspective
All regression problems for linear-in-the-parameters models can be written:
minimizeθ‖Xθ − Y‖22, X ∈ Rn×p, θ ∈ Rp, Y ∈ Rn (22)
n : number of data pairs (xi, yi)p : number of coefficients θ1, · · · , θp.We assume n > p.
Recall First Order Necessary Condition (FONC): If θ∗ minimizes (22),then d
dθ‖Xθ − Y‖22 = 0. Let’s expand this condition!
0 =d
dθ‖Xθ − Y‖22
=d
dθ(Xθ − Y)T(Xθ − Y)
=d
dθ
(θTXTXθ − 2YTXθ + YTY
)= 2XTXθ − 2XTY
⇒ θ∗ =(XTX
)−1XTY
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 23
Least Squares with L2 Regularizationa.k.a. Ridge Regression
What if we define “best fit” by a different criterion? For example, weminimize the sum of residuals squared, but penalize the coefficients fromgetting “too big”. Consider
minimizeθ ‖Xθ − Y‖22 + α‖θ‖22 (23)
Apply FONC:
0 =d
dθ‖Xθ − Y‖22 + α‖θ‖22
=d
dθ(Xθ − Y)T(Xθ − Y) + αθTθ
=d
dθ
(θT(XTX + αI)θ − 2YTXθ + YTY
)= 2(XTX + αI)θ − 2XTY
⇒ θ∗ =(XTX + αI
)−1XTY
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 24
Ridge Coefficients as you vary α
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 25
Least Squares with L1 Regularizationa.k.a. Lasso Regression
What if we define “best fit” by a different criterion? For example, supposeour data occasionally contains outliers that can bias our fitted linear modelundesirably. Is there a “robust regression” method? Yes.
minθ‖Xθ − Y‖22 + α‖θ‖1 (24)
L2 penalties place small weight on small coefficients
θ2i is very small when θi is small
Little incentive to drive θi to zero, unless you consider |θi| instead.
Note: Due to the 1-norm, this is no longer a QP! It is, however, a CP.
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 26
y = θ0 + θ1x + θ2x2 + · · ·+ θ8x8
0 0.2 0.4 0.6 0.8 12
2.5
3
3.5
SOC
Vo
ltag
e
Data
LSQ Fit
θ0 = 2.6460; θ1 = 5.5442; θ2 = −15.7690; θ3 = 16.4894; θ4 = −0.9965;θ5 = −4.2202; θ6 = −2.8927; θ7 = 0.0602; θ8 = 2.6326;
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 27
y = θ0 + θ1x + θ2x2 + · · ·+ θ8x8
0 0.2 0.4 0.6 0.8 12
2.5
3
3.5
SOC
Vo
ltag
e
DataL
1 Reg w/ α = 0.0042
θ0 = 2.9873; θ1 = 0.9366; θ2 = −0.5531; θ3 = −0.0641; θ4 = 0;θ5 = 0; θ6 = 0; θ7 = 0;θ8 = 0.1052
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 28
y = θ0 + θ1x + θ2x2 + · · ·+ θ8x8
0 0.2 0.4 0.6 0.8 12
2.5
3
3.5
SOC
Vo
ltag
e
DataL
1 Reg w/ α = 0.0083
θ0 = 3.0579; θ1 = 0.4773; θ2 = 0;θ3 = −0.1202; θ4 = 0;θ5 = 0; θ6 = 0; θ7 = 0; θ8 = 0
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 29
y = θ0 + θ1x + θ2x2 + · · ·+ θ8x8
0 0.2 0.4 0.6 0.8 12
2.5
3
3.5
SOC
Vo
ltag
e
DataL
1 Reg w/ α = 0.0125
θ0 = 3.0889; θ1 = 0.3547; θ2 = 0; θ3 = 0; θ4 = 0;θ5 = 0; θ6 = 0; θ7 = 0; θ8 = 0
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 30
y = θ0 + θ1x + θ2x2 + · · ·+ θ8x8
0 0.2 0.4 0.6 0.8 12
2.5
3
3.5
SOC
Vo
ltag
e
DataL
2 Reg w/ α = 0.000
L2 Reg w/ α = 0.001
L2 Reg w/ α = 0.010
θ0 = 2.1; θ1 = 28.8; θ2 = −301; θ3 = 1595; θ4 = −4743; θ5 = 8222; θ6 = −8238; θ7 = 4414; θ8 = −977
θ0 = 2.71; θ1 = 4.13; θ2 = −8.51; θ3 = 4.10; θ4 = 3.42; θ5 = −0.34; θ6 = −2.31; θ7 = −1.49; θ8 = 1.75
θ0 = 2.82; θ1 = 2.40; θ2 = −2.81; θ3 = −0.33; θ4 = 0.71; θ5 = 0.69; θ6 = 0.32; θ7 = −0.04; θ8 = −0.27Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 31
A Generalized Linear Model
The generalized linear model is given by:
y =
p∑i=1
θiφi(x) = θTφ(x) (25)
y ∈ R is the output of interest
θ ∈ Rp are the coefficients or parameters to fit
φ(x) are “regressors” or “predictors”, which can involve dependentdata in a nonlinear way
Summary of Regression Procedures
Least Squares (LSQ), a.k.a. linear least squares, ordinary least squares(convex QP)
LSQ w/ L2 Regularization, a.k.a. ridge regression (convex QP)
LSQ w/ L1 Regularization, a.k.a. lasso regression (CP)
LSQ w/ L1 and L2 Regularization, a.k.a. elastic net (CP)
LSQ w/ Huber Regularization (Hybridized L1+L2), a.k.a. Robust LSQ (CP)
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 32
Markowitz Portfolio Optimization - I
Problem Statement: Imagine you are an investment portfolio manager.You control a large sum of money, and can invest in n different assets. Atthe end of some time period, your investment produces a financial return.The key challenge, here, is the return is not easily predictable. It is random.
Notation:
xi denotes the percentage of fund to invest in asset i. Note that∑ni=1 xi = 1, and xi ≥ 0
Return is well characterized by Gaussian distribution N (µ,Σ), whereµ ∈ Rn is expected return and Σ ∈ Rn×n is the covariance
Examples:
Asset i has expected return µi =2%, with std dev of√
Σii =5%
Asset j has expected return µj =5%, with std dev of√
Σjj =50%
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 33
Markowitz Portfolio Optimization - II
Suppose we seek to
maximize expected return, AND
minimize risk
These two objectives cannot be achieved w/o tradeoffs. Therefore, oneoften “scalarizes” this bi-criterion problem to explore the tradeoff:
minimize −µTx + γ · xTΣx (26)
subject to 1Tx = 1, x � 0 (27)
where the parameter γ ≥ 0 is called the “risk aversion” parameter.
Increasing γ increases your sensitivity to risk
γ = 0 means you are risk neutral
γ < 0 means you are a risk seeker. Note this is NOT a convex QP.
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 34
Markowitz Portfolio Optimization - III
Consider this expected return & covariance data for a portfolio of 3 assets:
µ = [1.02, 1.05, 1.04]T , Σ =
(0.05)2 0 00 (0.5)2 00 0 (0.1)2
(28)
1.02 1.025 1.03 1.035 1.04 1.045 1.05Expected Return [%]
0
0.05
0.1
0.15
0.2
0.25
Ris
k
X: 1.046Y: 0.09999
. = 0
. = 1
Pareto Frontier
. = 0.05
Figure: Trade off between maximizingexpected return and minimizing risk. Thistrade off curve is called a “Pareto Frontier”
0 0.2 0.4 0.6 0.8 1Risk Aversion Param, .
0
0.2
0.4
0.6
0.8
1
Por
tfolio
Dis
trib
utio
n
x1
x2
x3
Figure: Optimal portfolio investmentstrategy, as risk aversion parameter γincreases.
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 35
Quadratically Constrained QPs
A generalization of the convex QP problem is the quadratically constrainedQP (QCQP):
minimize1
2xTQx + RTx + S (29)
subject to1
2xTQix + RT
i x + Si ≤ 0, ∀ i = 1, · · · ,m (30)
Aeqx = beq (31)
where Q,Qi � 0 for the program to be convex.
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 36
Outline
1 Convex Programming
2 Linear Programming
3 Quadratic Programming
4 Second Order Cone Programming
5 Robust Programming & Chance Constraints
6 Maximum Likelihood Estimation
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 37
Second Order Cone Programs
Second Order Cone Program (SOCP) is defined as:
minimize fTx (32)
subject to ‖Aix + bi‖2 ≤ cTi x + di, i = 1, · · · ,m (33)
Aeqx = beq (34)
Inequalities form a “second order cone” constraint
The unit second-order (convex)cone of dimension k is
Ck
={
[x; t] | x ∈ Rk−1, t ∈ R, ‖x‖ ≤ t}
which is also called the “Ice cream”cone or “Lorentz” cone.
Figure: Boundary of second-order conein R3, {(x1, x2, t)|(x2
1 + x22)
1/2 ≤ t}.
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 38
Convex QP→ SOCP - I
Consider the convex QP Problem:
minimize1
2xTQx + RTx + S (35)
subject to Ax ≤ b, Aeqx = beq (36)
where Q ∈ Rn×n is symmetric and positive definite, R ∈ Rn, and S ∈ R.Note that ∥∥∥∥∥ 1√
2Q1/2x +
√2
2Q−1/2R
∥∥∥∥∥2
=1
2xTQx + RTx +
1
2RTQ−1R (37)
This allows us to re-write this convex QP as
minimize
∥∥∥∥∥ 1√2
Q1/2x +
√2
2Q−1/2R
∥∥∥∥∥2
− 1
2RTQ−1R + S (38)
subject to Ax ≤ b, Aeqx = beq (39)
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 39
Convex QP→ SOCP - II
which can be recast as a SOCP
minimize t (40)
subject to
∥∥∥∥∥ 1√2
Q1/2x +
√2
2Q−1/2R
∥∥∥∥∥ ≤ t (41)
Ax ≤ b, Aeqx = beq (42)
Note:
Original convex QP: n optimization variables; m + l constraints
SOCP reformulation: n + 1 optimization variables; m + l + 1 constraints
Remark:
Can extend to semi-definite Q
Can extend to QCQP problems
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 40
Robust Dispatch w/ High Penetration of Renewables
Problem Statement: Economically dispatch n generators to serveelectricity demand D. In this case, large percentage of n generators arerenewable.
Key Challenge: Maximum power generating capacity of renewable plantsis uncertain. For example,
Wind farm can produce anywhere between 0MW and 10MW
Solar PV farm can produce anywhere between 0MW and 20MW
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 41
Robust Dispatch w/ High Penetration of Renewables
Model as LP:
minimize fTx (43)
subject to RTx ≥ D (44)
0 ≤ x ≤ 1 (45)
x ∈ Rn is vector of power generation dispatched to the generators, as afraction of the generator’s rated capacity
f ∈ Rn is vector of marginal costs [USD/MW]
D ∈ R is electricity demand [MW]
R ∈ Rn is vector of real-time power capacity for generators [MW]
Convert into standard form:
minimize fTx (46)
subject to aTx ≤ b (47)
0 ≤ x ≤ 1 (48)
where a = −R and b = −DProf. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 42
Robust Dispatch w/ High Penetration of Renewables
Focus on uncertain parameter a. To illustrate, imagine...
generator i is wind farm, 0MW ≤ Ri ≤ 10MW; −10MW ≤ ai ≤ 0MW
generator j is solar PV farm, 0MW ≤ Rj ≤ 20MW; −20MW ≤ aj ≤ 0MW
generator k is natural gas plant, Rk = 50MW; ak = −50MW
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 43
Robust Dispatch w/ High Penetration of Renewables
Assumption: Vector a is known to lie within an ellipsoid
a ∈ E = {a + Pu | ‖u‖2 ≤ 1} (49)
where a ∈ Rn is ellipsoid center, and P ∈ Rn×n is positive semidefinite matrix
wind farm i: ai = −5 MW
solar PV farm j: aj = −10 MW
natural gas plant k: ak = −50 MW
Recall that λi(P) provides semi-axis lengths of ellipsoid.
Define P as diagonal matrix with:
wind farm i: Pii = 5 MW
solar PV farm j: Pjj = 10 MW
natural gas plant k: Pkk = 0 MW
Note: when P is positive definite, we have a 100% renewable grid!
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 44
Robust Dispatch w/ High Penetration of Renewables
Assumption: Vector a is known to lie within an ellipsoid
a ∈ E = {a + Pu | ‖u‖2 ≤ 1} (49)
where a ∈ Rn is ellipsoid center, and P ∈ Rn×n is positive semidefinite matrix
wind farm i: ai = −5 MW
solar PV farm j: aj = −10 MW
natural gas plant k: ak = −50 MW
Recall that λi(P) provides semi-axis lengths of ellipsoid.
Define P as diagonal matrix with:
wind farm i: Pii = 5 MW
solar PV farm j: Pjj = 10 MW
natural gas plant k: Pkk = 0 MW
Note: when P is positive definite, we have a 100% renewable grid!
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 44
Robust Dispatch w/ High Penetration of Renewables
Robust LP requires us to satisfy demand in EVERY instance of a ∈ E:
minimize fTx (50)
subject to aTx ≤ b, ∀ a ∈ E (51)
0 ≤ x ≤ 1 (52)
Note: We have n optimization vars and +∞ constraints... not cool.
Alternative idea: Solve under worst case scenario for a ∈ E . Convert (51)to
max{
aTx | a ∈ E}≤ b (53)
Re-write left hand side of (53) as
max{
aTx | a ∈ E}
= aTx + max{
uTPTx | ‖u‖2 ≤ 1}
(54)
= aTx + ‖PTx‖2 (55)
then the robust linear constraint can be re-expressed as
aTx + ‖PTx‖2 ≤ b (56)
which is a second order cone constraint
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 45
Robust Dispatch w/ High Penetration of Renewables
Robust LP requires us to satisfy demand in EVERY instance of a ∈ E:
minimize fTx (50)
subject to aTx ≤ b, ∀ a ∈ E (51)
0 ≤ x ≤ 1 (52)
Note: We have n optimization vars and +∞ constraints... not cool.
Alternative idea: Solve under worst case scenario for a ∈ E . Convert (51)to
max{
aTx | a ∈ E}≤ b (53)
Re-write left hand side of (53) as
max{
aTx | a ∈ E}
= aTx + max{
uTPTx | ‖u‖2 ≤ 1}
(54)
= aTx + ‖PTx‖2 (55)
then the robust linear constraint can be re-expressed as
aTx + ‖PTx‖2 ≤ b (56)
which is a second order cone constraint
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 45
Robust Dispatch w/ High Penetration of Renewables
Robust LP requires us to satisfy demand in EVERY instance of a ∈ E:
minimize fTx (50)
subject to aTx ≤ b, ∀ a ∈ E (51)
0 ≤ x ≤ 1 (52)
Note: We have n optimization vars and +∞ constraints... not cool.
Alternative idea: Solve under worst case scenario for a ∈ E . Convert (51)to
max{
aTx | a ∈ E}≤ b (53)
Re-write left hand side of (53) as
max{
aTx | a ∈ E}
= aTx + max{
uTPTx | ‖u‖2 ≤ 1}
(54)
= aTx + ‖PTx‖2 (55)
then the robust linear constraint can be re-expressed as
aTx + ‖PTx‖2 ≤ b (56)
which is a second order cone constraintProf. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 45
Robust Dispatch w/ High Penetration of Renewables
Robust LP→ SOCP – a sub-class of convex optimization problems:
minimize fTx (57)
subject to aTx + ‖PTx‖2 ≤ b (58)
0 ≤ x ≤ 1 (59)
Note: additional norm term acts as regularization term. Namely, it preventsx from being large in directions with considerable uncertainty.
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 46
Outline
1 Convex Programming
2 Linear Programming
3 Quadratic Programming
4 Second Order Cone Programming
5 Robust Programming & Chance Constraints
6 Maximum Likelihood Estimation
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 47
A Stochastic Approach to Robust Programming
In previous example, we optimized w.r.t. worst case scenario. Some wouldargue this is too conservative. That is, we could allow constraint violationsin very low probability situations. This motivates “chance constraints”.
Recall the LP
minimize cTx (60)
subject to aTx ≤ b (61)
Assume a ∈ Rn is Gaussian random vector, i.e. a ∼ N (a,Σ).Then aTx is a Gaussian random variable, with
mean aTx
variance xTΣx
Express probability that aTx ≤ b is satisfied as
Pr(aTx ≤ b) = Φ
(b− aTx
‖Σ1/2x‖2
)(62)
where Φ(x) = (1/√
2π)∫ x−∞ e−y2/2dy is the CDF of N (0,1).
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 48
A Stochastic Approach to Robust Programming
In previous example, we optimized w.r.t. worst case scenario. Some wouldargue this is too conservative. That is, we could allow constraint violationsin very low probability situations. This motivates “chance constraints”.Recall the LP
minimize cTx (60)
subject to aTx ≤ b (61)
Assume a ∈ Rn is Gaussian random vector, i.e. a ∼ N (a,Σ).Then aTx is a Gaussian random variable, with
mean aTx
variance xTΣx
Express probability that aTx ≤ b is satisfied as
Pr(aTx ≤ b) = Φ
(b− aTx
‖Σ1/2x‖2
)(62)
where Φ(x) = (1/√
2π)∫ x−∞ e−y2/2dy is the CDF of N (0,1).
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 48
A Stochastic Approach to Robust Programming
In previous example, we optimized w.r.t. worst case scenario. Some wouldargue this is too conservative. That is, we could allow constraint violationsin very low probability situations. This motivates “chance constraints”.Recall the LP
minimize cTx (60)
subject to aTx ≤ b (61)
Assume a ∈ Rn is Gaussian random vector, i.e. a ∼ N (a,Σ).Then aTx is a Gaussian random variable, with
mean aTx
variance xTΣx
Express probability that aTx ≤ b is satisfied as
Pr(aTx ≤ b) = Φ
(b− aTx
‖Σ1/2x‖2
)(62)
where Φ(x) = (1/√
2π)∫ x−∞ e−y2/2dy is the CDF of N (0,1).
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 48
Chance Constraints
This enables us to relax (61) into
minimize cTx (63)
subject to Pr(aTx ≤ b) ≥ η [“chance constraint”] (64)
In words, we require aTx ≤ b is satisfied with a reliability of η, where η istypically 0.9, 0.95, or 0.99
Interestingly, we can use (62) to convert this stochastic LP into an SOCP
Pr(aTx ≤ b) = Φ
(b− aTx
‖Σ1/2x‖2
)≥ η (65)
b− aTx
‖Σ1/2x‖2≥ Φ−1(η) (66)
Φ−1(η) · ‖Σ1/2x‖2 ≤ b− aTx (67)
where we recognize the final inequality as a second order cone constraint
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 49
Chance Constraints
This enables us to relax (61) into
minimize cTx (63)
subject to Pr(aTx ≤ b) ≥ η [“chance constraint”] (64)
In words, we require aTx ≤ b is satisfied with a reliability of η, where η istypically 0.9, 0.95, or 0.99
Interestingly, we can use (62) to convert this stochastic LP into an SOCP
Pr(aTx ≤ b) = Φ
(b− aTx
‖Σ1/2x‖2
)≥ η (65)
b− aTx
‖Σ1/2x‖2≥ Φ−1(η) (66)
Φ−1(η) · ‖Σ1/2x‖2 ≤ b− aTx (67)
where we recognize the final inequality as a second order cone constraint
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 49
Chance Constrained LP→ SOCP
thus, we converted chance constrained LP into the SOCP
minimize cTx (68)
subject to aTx + Φ−1(η)‖Σ1/2x‖2 ≤ b (69)
where Φ−1(·) is the inverse CDF
Note Φ−1(η) ≥ 0 to be valid second order cone constraint. True, if η ≥ 0.5. Inpractice, we almost always want reliability ≥ 0.5.
Comments:
approach extends to chanced constrained convex QPs
approach does NOT directly extend to non-Gaussian distributions
much more elegant & efficient than two-stage stochastic programming
Want to learn more about robust programming? ReadR. Bental, L. El Ghaoui, A. Nemirovski. Robust Optimization, PrincetonUniversity Press, 2009
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 50
Chance Constrained LP→ SOCP
thus, we converted chance constrained LP into the SOCP
minimize cTx (68)
subject to aTx + Φ−1(η)‖Σ1/2x‖2 ≤ b (69)
where Φ−1(·) is the inverse CDF
Note Φ−1(η) ≥ 0 to be valid second order cone constraint. True, if η ≥ 0.5. Inpractice, we almost always want reliability ≥ 0.5.
Comments:
approach extends to chanced constrained convex QPs
approach does NOT directly extend to non-Gaussian distributions
much more elegant & efficient than two-stage stochastic programming
Want to learn more about robust programming? ReadR. Bental, L. El Ghaoui, A. Nemirovski. Robust Optimization, PrincetonUniversity Press, 2009
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 50
Chance Constrained LP→ SOCP
thus, we converted chance constrained LP into the SOCP
minimize cTx (68)
subject to aTx + Φ−1(η)‖Σ1/2x‖2 ≤ b (69)
where Φ−1(·) is the inverse CDF
Note Φ−1(η) ≥ 0 to be valid second order cone constraint. True, if η ≥ 0.5. Inpractice, we almost always want reliability ≥ 0.5.
Comments:
approach extends to chanced constrained convex QPs
approach does NOT directly extend to non-Gaussian distributions
much more elegant & efficient than two-stage stochastic programming
Want to learn more about robust programming? ReadR. Bental, L. El Ghaoui, A. Nemirovski. Robust Optimization, PrincetonUniversity Press, 2009
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 50
Ex: Portfolio Optimization
Recall the Portfolio optimization problem:
x ∈ Rn indicates portfolio allocation; xi is fraction invested in asset i
x must satisfy 1Tx = 1, x ≥ 0
return (in percentage) is given by pTx, where p ∈ Rn and p ∼ N (p,Σ)
Objective: Maximize expected return, subject to limit on probability of loss
minimize −E[pTx
](70)
subject to Pr[pTx ≥ 1
]≥ η (71)
1Tx = 1, x � 0 (72)
can be recast into the SOCP
minimize −pTx (73)
subject to Φ−1(η) · ‖Σ1/2x‖2 ≤ pTx− 1 (74)
1Tx = 1, x � 0 (75)
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 51
Ex: Portfolio Optimization
Recall the Portfolio optimization problem:
x ∈ Rn indicates portfolio allocation; xi is fraction invested in asset i
x must satisfy 1Tx = 1, x ≥ 0
return (in percentage) is given by pTx, where p ∈ Rn and p ∼ N (p,Σ)
Objective: Maximize expected return, subject to limit on probability of loss
minimize −E[pTx
](70)
subject to Pr[pTx ≥ 1
]≥ η (71)
1Tx = 1, x � 0 (72)
can be recast into the SOCP
minimize −pTx (73)
subject to Φ−1(η) · ‖Σ1/2x‖2 ≤ pTx− 1 (74)
1Tx = 1, x � 0 (75)
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 51
Ex: Portfolio Optimization
Recall the Portfolio optimization problem:
x ∈ Rn indicates portfolio allocation; xi is fraction invested in asset i
x must satisfy 1Tx = 1, x ≥ 0
return (in percentage) is given by pTx, where p ∈ Rn and p ∼ N (p,Σ)
Objective: Maximize expected return, subject to limit on probability of loss
minimize −E[pTx
](70)
subject to Pr[pTx ≥ 1
]≥ η (71)
1Tx = 1, x � 0 (72)
can be recast into the SOCP
minimize −pTx (73)
subject to Φ−1(η) · ‖Σ1/2x‖2 ≤ pTx− 1 (74)
1Tx = 1, x � 0 (75)
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 51
Outline
1 Convex Programming
2 Linear Programming
3 Quadratic Programming
4 Second Order Cone Programming
5 Robust Programming & Chance Constraints
6 Maximum Likelihood Estimation
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 52
Problem Setting
Problem Statement: You are provided m data points for random variabley. Goal: fit a probability distribution to this data.
Notation:
p(y; θ) is probability density function for y
free parameters θ ∈ Rn.
“likelihood function” is when we consider p(y; θ) as function of θ, forfixed value of y
“log-likelihood function” l(θ) = log p(y; θ)
Maximum Likelihood Estimation:
θ? = arg maxθ
l(θ) (76)
Remark: Interestingly, (76) is a convex optimization problem for manycommon scenarios.
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 53
MLE for Linear Models
Consider a linear measurement model:
yi = θTφi + vi, i = 1, · · · ,m (77)
θ ∈ Rn are parameters to be estimatedyi ∈ R are the measured data pointsφi ∈ Rn are the regressorsvi ∈ R are the measurement errors or noise. Assume vi’s are IID withprobability density p(·)
The likelihood function, given all the measured points yi and regressors φi isgiven by products of likelihood for each data point yi, φi:
p(v; θ) =m∏
i=1
p(yi − θTφi) (78)
The log-likelihood function is then
l(θ) = log p(v; θ) =m∑
i=1
log p(yi − θTφi) (79)
recall log(a · b) = log(a) + log(b)
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 54
MLE for Linear Models
Consider a linear measurement model:
yi = θTφi + vi, i = 1, · · · ,m (77)
θ ∈ Rn are parameters to be estimatedyi ∈ R are the measured data pointsφi ∈ Rn are the regressorsvi ∈ R are the measurement errors or noise. Assume vi’s are IID withprobability density p(·)
The likelihood function, given all the measured points yi and regressors φi isgiven by products of likelihood for each data point yi, φi:
p(v; θ) =m∏
i=1
p(yi − θTφi) (78)
The log-likelihood function is then
l(θ) = log p(v; θ) =m∑
i=1
log p(yi − θTφi) (79)
recall log(a · b) = log(a) + log(b)Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 54
MLE for Linear Models & Gaussian Noise
The MLE problem is
maximizeθ
m∑i=1
log p(yi − θTφi) (80)
the likelihood function is log-concave for several common distributions
Suppose that vi ∼ N (0, σ2). Thus p(v) = (2πσ2)−1/2 · e−v2/(2σ2). Then...
m∑i=1
log p(vi) =m∑
i=1
[−1
2log(2πσ2)− 1
2σ2· v2
i
](81)
= −m
2log(2πσ2)− 1
2σ2
m∑i=1
v2i (82)
m∑i=1
log p(yi − θTφi) = −m
2log(2πσ2)− 1
2σ2
m∑i=1
(yi − θTφi)2 (83)
= −m
2log(2πσ2)− 1
2σ2‖ΦTθ − Y‖22 (84)
where Φ = [φ1, · · · , φm] ∈ Rn×m, Y = [y1; · · · ; ym] ∈ Rm
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 55
MLE for Linear Models & Gaussian Noise
The MLE problem is
maximizeθ
m∑i=1
log p(yi − θTφi) (80)
the likelihood function is log-concave for several common distributions
Suppose that vi ∼ N (0, σ2). Thus p(v) = (2πσ2)−1/2 · e−v2/(2σ2). Then...
m∑i=1
log p(vi) =m∑
i=1
[−1
2log(2πσ2)− 1
2σ2· v2
i
](81)
= −m
2log(2πσ2)− 1
2σ2
m∑i=1
v2i (82)
m∑i=1
log p(yi − θTφi) = −m
2log(2πσ2)− 1
2σ2
m∑i=1
(yi − θTφi)2 (83)
= −m
2log(2πσ2)− 1
2σ2‖ΦTθ − Y‖22 (84)
where Φ = [φ1, · · · , φm] ∈ Rn×m, Y = [y1; · · · ; ym] ∈ Rm
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 55
MLE for Linear Models & Gaussian Noise
The MLE problem with Gaussian noise is a least squares problem!
θ? = arg minθ‖ΦTθ − Y‖22 (85)
Exercise:Derive the MLE optimization formulation for (77) for the followingdistributions for vi:
1. Laplacian noise distribution: p(v) = 1/(2a) · e−|v|/a
2. Uniform noise distribution: p(v) = 1/(2a) on [−a,+a] and zeroelsewhere
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 56
MLE for Linear Models & Gaussian Noise
The MLE problem with Gaussian noise is a least squares problem!
θ? = arg minθ‖ΦTθ − Y‖22 (85)
Exercise:Derive the MLE optimization formulation for (77) for the followingdistributions for vi:
1. Laplacian noise distribution: p(v) = 1/(2a) · e−|v|/a
2. Uniform noise distribution: p(v) = 1/(2a) on [−a,+a] and zeroelsewhere
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 56
MLE for Linear Models & Laplacian Noise
Laplace distribution
p(vi) =1
2a· e−
|v|a
Laplace distributions.Image Source: wikipedia.org
θ? = arg minθ‖ΦTθ − Y‖1 (86)
m∑i=1
log p(vi) =m∑
i=1
[log
(1
2a
)− 1
a· |vi|
]= m log
(1
2a
)− 1
a
m∑i=1
|vi|
(87)m∑
i=1
log p(yi − θTφi) = m log
(1
2a
)− 1
a
m∑i=1
|yi − θTφi| (88)
= m log
(1
2a
)− 1
a‖ΦTθ − Y‖1 (89)
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 57
Logistic Regressionwith application to discrete choice models
Consider binary random variable y ∈ {0,1}, with
Pr [y = 1] = p, Pr [y = 0] = 1− p (90)
for example
y = 1 corresponds to “charge EV at Nanshan iPark chg station”y = 0 corresponds to “DO NOT charge EV at Nanshan iPark chg station”
Hypothesize probability p is a function of the EV driver’s utility function, i.e.p(U). The utility function is:
U = aTφ+ b (91)
where a ∈ Rn,b ∈ R are free parameters; φ ∈ Rn are “explanatory” vars.Example explanatory vars:
day of week
time of day
charging price
is parking space openWe relate probability p to utility U using the logistic model:
p(U) =eU
1 + eU=
eaTφ+b
1 + eaTφ+b(92)
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 58
Logistic Regressionwith application to discrete choice models
Consider binary random variable y ∈ {0,1}, with
Pr [y = 1] = p, Pr [y = 0] = 1− p (90)
for example
y = 1 corresponds to “charge EV at Nanshan iPark chg station”y = 0 corresponds to “DO NOT charge EV at Nanshan iPark chg station”
Hypothesize probability p is a function of the EV driver’s utility function, i.e.p(U). The utility function is:
U = aTφ+ b (91)
where a ∈ Rn,b ∈ R are free parameters; φ ∈ Rn are “explanatory” vars.Example explanatory vars:
day of week
time of day
charging price
is parking space openWe relate probability p to utility U using the logistic model:
p(U) =eU
1 + eU=
eaTφ+b
1 + eaTφ+b(92)
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 58
Logistic Regressionwith application to discrete choice models
Logistic model:
p(U) =eU
1 + eU
Objective: Given explana-tory variables φ1, · · · , φm ∈ Rn
with corresponding outcomesy1, · · · , ym ∈ {0,1}, find freeparameters a,b via MLE
Re-order data
φ1, · · · , φq is for outcome y = 1
φq+1, · · · , φm is for outcome y = 0
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 59
Logistic Regressionwith application to discrete choice models
Logistic model:
p(U) =eU
1 + eU
Objective: Given explana-tory variables φ1, · · · , φm ∈ Rn
with corresponding outcomesy1, · · · , ym ∈ {0,1}, find freeparameters a,b via MLE
Re-order data
φ1, · · · , φq is for outcome y = 1
φq+1, · · · , φm is for outcome y = 0
Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 59
Likelihood function
p(y; a,b) =
q∏i=1
pi ·m∏
i=q+1
(1− pi) (93)
Log-likelihood function
l(a,b) =
q∑i=1
log pi +m∑
i=q+1
log(1− pi) (94)
=
q∑i=1
logeUi
1 + eUi+
m∑i=q+1
log1
1 + eUi(95)
=
q∑i=1
[Ui − log(1 + eUi )
]−
m∑i=q+1
log(1 + eUi ) (96)
=
q∑i=1
Ui −m∑
i=1
log(1 + eUi ) (97)
=
q∑i=1
[aTφi + b
]−
m∑i=1
log(
1 + eaTφi+b)
(98)
where this last expression is a concave function w.r.t. (a,b). Solve via CP.Prof. Moura | Tsinghua-Berkeley Shenzhen Institute ENE 2XX | LEC 02 - Convex Programs Slide 60