Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to...

Non-convex optimization

Issam Laradji

Strongly Convex

Objective function

Strongly Convex

Assumptionsf(x)

Objective function

Gradient Lipschitz continuous

Strongly convex

Strongly Convex

Assumptionsf(x)

Objective function

Strongly convex

Randomized coordinate descent

Non-strongly Convex optimization

Assumptions

Convergence rate

Compared to the strongly convex convergence rate

Non-strongly Convex optimization

Non-Strongly Convex

Assumptions

Objective function

Lipschitz continuous

Restricted secant inequality

Invex functions (a generalization of convex function)

Assumptions

Objective function

Polyak [1963]

This inequality simply requires that the gradient grows faster than a linear function as we move away from the optimal function value.

Invex function (one global minimum)

Assumptions

Objective function

Polyak [1963] - for invex functions where this holds

Invex function (one global minimum)

Assumptions

Objective function

Polyak [1963] - for invex functions where this holds

PolyakConvex

Venn diagram

Non-convex functions

Non-convex functionslocal maxima

Global minimum Local minima

Strategy 1: local optimization of the non-convex function

Non-convex functionslocal maxima

Assumptions for local non-convex optimization

Lipschitz continuous Locally convex

Global minimum

Local minima

All convex functions rates apply.

Local randomized coordinate descentlocal maxima

Issue: dealing with saddle points

local maxima

Global minimum

Local minima

Strategy 2: Global optimization of the non-convex function

Issue: Exponential number of saddle points

Local non-convex optimization● Gradient Descent

○ Difficult to define a proper step size

● Newton method○ Newton method solves the slowness problem by rescaling the gradients in

each direction with the inverse of the corresponding eigenvalues of the hessian

○ can result in moving in the wrong direction (negative eigenvalues)

● Saddle-Free Newton’s method○ rescales gradients by the absolute value of the inverse Hessian and the

Hessian’s Lanczos vectors.

Local non-convex optimization● Random stochastic gradient descent

○ Sample noise r uniformly from unit sphere○ Escapes saddle points but step size is difficult to determine

● Cubic regularization [Nesterov 2006]

Hessian Lipschitz continuous

Local non-convex optimization● Random stochastic gradient descent

○ Sample noise r uniformly from unit sphere○ Escapes saddle points but step size is difficult to determine

● Momentum○ can help escape saddle points (rolling ball)

● Matrix completion problem [De Sa et al. 2015]

Global non-convex optimization

● Applications (usually for datasets with missing data)○ matrix completion ○ Image reconstruction○ recommendation systems.

● Matrix completion problem [De Sa et al. 2015]

● Reformulate it as (unconstrained non-convex problem)

● Gradient descent

● Reformulate it as (unconstrained non-convex problem)

● Gradient descent

● Using the Riemannian manifold, we can derive the following

● This widely-used algorithm converges globally, using only random initialization

Convex relaxation of non-convex functions optimization

● Convex Neural Networks [Bengio et al. 2006]○ Single-hidden layer network

original neural networks non-convex problem

Convex relaxation of non-convex functions optimization

● Convex Neural Networks [Bengio et al. 2006]○ Single-hidden layer network

○ Use alternating minimization

○ Potential issues with the activation function

Bayesian optimization (global non-convex optimization)

● Typically used for finding optimal parameters ○ Determining the step size of # hidden layers for neural networks○ The parameter values are bounded ?

● Other methods include sampling the parameter values random uniformly○ Grid-search

● Fit Gaussian process on the observed data (purple shade)○ Probability distribution on the function values

● Acquisition function (green shade)○ a function of

■ the objective value (exploitation) in the Gaussian density function; and■ the uncertainty in the prediction value (exploration).

Bayesian optimization

● Slower than grid-search with low level of smoothness (illustrate)● Faster than grid-search with high level of smoothness (illustrate)

Grid-search Bayesian optimization

A measure of smoothness

Summary

Non-convex optimization

● Strategy 1: Local non-convex optimization○ Convexity convergence rates apply

○ Escape saddle points using, for example, cubic regularization and saddle-free newton

update

● Strategy 2: Relaxing the non-convex problem to a convex problem○ Convex neural networks

● Strategy 3: Global non-convex optimization○ Bayesian optimization○ Matrix completion

Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to...

Documents

Convex Optimization Theory - Summary

OverSketched Newton: Fast Convex Optimization for

CONVEX OPTIMIZATION, SHAPE CONSTRAINTS, …

Introduction to Convex Optimization - CityU CScheewtan/CS8292Class/Lec1.pdf · Introduction to Convex Optimization Chee Wei Tan CS8292 : Advanced Topics in Convex Optimization and

Introduction to Convex Optimization - Alex Smolaalex.smola.org/.../cmu2013-10-701/slides/2_Recitation_Optimization.pdfConvexity Unconstrained Convex Optimization Constrained Optimization

Newton-Raphson Consensus for Distributed Convex Optimizationstaff.damvar/Publications/Varagnolo et... · 8 Here we propose a distributed Newton-Raphson optimization pro-9 cedure,

Stochastic Newton and quasi-Newton Methods for Large-Scale ...aspremon/Houches/talks/Goldfarb.pdf · Newton-like and quasi-Newton methods for convex stochastic optimization problems

convex optimization solvers modeling systems disciplined ... · EE364a Review Disciplined Convex Programming and CVX • convex optimization solvers • modeling systems • disciplined

Newton-Raphson Consensus for Distributed Convex Optimization · Newton-Raphson Consensus for Distributed Convex Optimization Damiano Varagnolo, Filippo Zanella, Angelo Cenedese, Gianluigi

Convex Optimization for Data Science · Introduction Lectures on Convex Optimization. A Basic Course. Applied Optimization. – Springer, 2004. Nemirovski A. Lectures on modern convex

MATH 4211/6211 – Optimization Convex Optimization Problems

Convex Optimization - Lecture Slides

Convex Optimization Review

OP06 Convex Optimization

Convex Optimization Chapter 1 Introduction. What, Why and How What is convex optimization Why study convex optimization How to study convex optimization

Convex Analysis and Optimization

Convex Optimization - IHMC

Convex Optimization and Modeling - Saarland University€¦ · Convex Optimization and Modeling Convex Optimization Fourth lecture, 05.05.2010 Jun.-Prof. Matthias Hein

Introduction to Convex Optimization

Stochastic Newton and quasi-Newton Methods for …coral.ise.lehigh.edu/usmex2016/files/2016/04/talks/...Stochastic Newton and quasi-Newton Methods for Large-Scale Convex and Non-Convex