LijunZhang [email protected] //cs.nju.edu.cn/zlj/Course/DM_15_Lecture/Lecture... · · 2018-02-24Convex Optimization LijunZhang [email protected] Modification of boyd/cvxbook/bv_cvxslides.pdf

Convex Optimization

Lijun [email protected]://cs.nju.edu.cn/zlj

Modification of http://stanford.edu/~boyd/cvxbook/bv_cvxslides.pdf

Outline

Introduction

Convex Sets & Functions

Convex Optimization Problems

Duality

Convex Optimization Methods

Summary

Mathematical Optimization

Optimization Problem

Applications

Dimensionality Reduction (PCA)

Clustering (NMF)

Classification (SVM)

max∈s. t. 1

min∈ , ∈

min∈ , ∈

s. t. 0, 0

Least-squares

The Problem

Given , predict by Properties

Linear Programming

The Problem

Properties

Convex Optimization Problem

The Problem

Conditions


The Problem

Properties

Nonlinear Optimization Definition

The objective or constraint functions are not linear Could be convex or nonconvex

Outline

Introduction



Duality


Summary

Affine Set

Convex Set

Convex Cone

Some Examples (1)

Some Examples (2)

Some Examples (3)

Operations that Preserve Convexity

Intersection

Affine Function

Perspective and Linear-Fractional Function

Convex Functions

Examples on

Examples on and

Restriction of a Convex Function to a Line

First-order Conditions

Second-order Conditions

Examples

Operations that Preserve Convexity

Positive Weighted Sum & Composition with Affine Function

Pointwise Maximum

Hinge loss: ℓ max 0,1

Pointwise Supremum

Composition with Scalar Functions

Vector Composition

Minimization

Perspective

The Conjugate Function

Examples

Outline

Introduction



Duality


Summary

Optimization Problem in Standard Form

Optimal and Locally Optimal Points

Implicit Constraints


Example

Local and Global Optima

Optimality Criterion for Differentiable

Examples

Popular Convex Problems

Linear Program (LP) Linear-fractional Program Quadratic Program (QP) Quadratically Constrained Quadratic

program (QCQP) Second-order Cone Programming

(SOCP) Geometric Programming (GP) Semidefinite Program (SDP)

Outline

Introduction



Duality


Summary

Lagrangian

Lagrangian

Lagrange Dual Function

Lagrange Dual Function

Least-norm Solution of Linear Equations

Lagrange Dual and Conjugate Function

The Dual Problem

Weak and Strong Duality

Weak and Strong Duality

Slater’s Constraint Qualification

Complementary Slackness

Karush-Kuhn-Tucker (KKT) Conditions

KKT Conditions for Convex Problem

An Example—SVM (1)

The Optimization Problem

Define the hinge loss as

Its Conjugate Function is


The Optimization Problem becomes

It is Equivalent to

The Lagrangian is


The Lagrange Dual Function is

Minimize one by one


Finally, We Obtain

The Dual Problem is





Can be used to recover ∗ from ∗



Can be used to recover ∗ from ∗

Outline

Introduction



Duality


Summary

More Assumptions

Lipschitz continuous

Strong Convexity

Smooth1 1 1 2

,

,

Performance Measure

The Problem

Convergence Rate

Iteration Complexity

Gradient-based Methods

The Convergence Rate

GD—Gradient Descent AGD—Nesterov’s Accelerated Gradient

Descent [Nesterov, 2005, Nesterov, 2007, Tseng, 2008]

EGD—Epoch Gradient Descent [Hazan and Kale, 2011]

SGD —SGD with -suffix Averaging [Rakhlinet al., 2012]

Gradient Descent (1)

Move along the opposite direction of gradients

Gradient Descent (2)

Gradient Descent with Projection

Projection Operator

Analysis (1)

Analysis (2)

Analysis (3)

A Key Step (1)

Evaluate the Gradient or Subgradient Logit loss

A Key Step (1)


Hinge loss

A Key Step (2)


Hinge loss

A Key Step (3)


Hinge loss

Outline

Introduction



Duality


Summary

Summary

Convex Sets & Functions Definitions, Operations that Preserve

Convexity Convex Optimization Problems Definitions, Optimality Criterion

Duality Lagrange, Dual Problem, KKT Conditions

Convex Optimization Methods Gradient-based Methods

Reference (1) Hazan, E. and Kale, S. (2011)

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization. In Proceedings of the 24th Annual Conference on Learning Theory, pages 421–436.

Nesterov, Y. (2005)Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127–152.

Nesterov, Y. (2007).Gradient methods for minimizing composite objective function. Core discussion papers.

Reference (2) Tseng, P. (2008).

On acclerated proximal gradient methods for convex-concave optimization. Technical report, University of Washington.

Boyd, S. and Vandenberghe, L. (2004).Convex Optimization. Cambridge University Press.

Rakhlin, A., Shamir, O., and Sridharan, K. (2012)Making gradient descent optimal for strongly convex stochastic optimization. In Proceedings of the 29th International Conference on Machine Learning, pages 449–456.

Documents

LijunZhang [email protected] //cs.nju.edu.cn/zlj/Course/DM_15_Lecture/Lecture... · · 2018-02-24Convex Optimization LijunZhang [email protected] Modification of boyd/cvxbook/bv_cvxslides.pdf