Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion

A KTEC Center of Excellence 1

Pattern Analysis using Convex Optimization: Part 2 of

Chapter 7 Discussion

Presenter: Brian Quanz

About today’s discussion…• Last time: discussed convex opt.

• Today: Will apply what we learned to 4 pattern analysis problems given in book:• (1) Smallest enclosing hypersphere (one-class SVM)

• (2) SVM classification

• (3) Support vector regression (SVR)

• (4) On-line classification and regression

About today’s discussion…• This time for the most part:

• Describe problems

• Derive solutions ourselves on the board!

• Apply convex opt. knowledge to solve

•Mostly board work today

Recall: KKT Conditions• What we will use:

• Key to remember ch. 7:• Complementary slackness -> sparse dual rep.

• Convexity -> efficient global solution

Novelty Detection: Hypersphere• Train data – learn support

•Capture with hypersphere

•Outside – ‘novel’ or ‘abnormal’ or ‘anomaly’

• Smaller sphere = more fine-tuned novelty detection

1st: Smallest Enclosing Hypersphere•Given:

• Find center, c, of smallest hypersphere containing S

S.E.H. Optimization Problem•O.P.:

• Let’s solve using Lagrangian and KKT and discuss

S.E.H.: Solution

•H(x) = 1 if x>=0, 0 o.w.

Dual=primal @

Theorem on bound of false positive

Hypersphere that only contains some data – soft hypersphere

• Balance missing some points and reducing radius• Robustness –single point could throw off

• Introduce slack variables (repeated approach)• 0 within sphere, squared distance outside

Hypersphere optimization problem•Now with trade off between radius

and training point error:

• Let’s derive solution again

Soft hypersphere solution

Linear Kernel Example

Similar theorem

Remarks• If data lies in subspace of feature

space:• Hypersphere overestimates support in perpendicular

• Can use kernel PCA (next week discussion)

• If normalized data (k(x,x)=1)• Corresponds to separating hyperplane, from origin

Maximal Margin Classifier•Data and linear classifier

•Hinge loss, gamma margin

• Linear separable if

Margin Example

Typical formulation• Typical formulation fixes gamma

(functional margbin) to 1 and allows w to vary since scaling doesn’t affect decision, margin proportional to 1/norm(w) to vary.

•Here we fix w norm, and vary functional margin gamma

Hard Margin SVM• Arrive at optimization problem

• Let’s solve

Solution

• Recall:

Example with Gaussian kernel

Soft Margin Classifier•Non-separable - Introduce slack

variables as before• Trade off with 1-norm of error vector

Solve Soft Margin SVM• Let’s solve it!

Soft Margin Solution

Soft Margin Example

Support Vector Regression• Similar idea to classification, except turned

inside-out

• Epsilon-insensitive loss instead of hinge

• Ridge Regression: Squared-error loss

Support Vector Regression• But, encourage sparseness

•Need inequalities• epsilon-insensitive loss

Epsilon-insensitive•Defines band around function for 0-

SVR (linear epsilon)•Opt. problem:

• Let’s solve again

SVR Dual and Solution•Dual problem

Online• So far batch: processed all at once

• Many tasks require data processed one at a time from start

• Learner:• Makes prediction

• Gets feedback (correct value)

• Updates

• Conservative only updates if non-zero loss

Simple On-line Alg.: Perceptron• Threshold linear function

• At t+1 weight updated if error

• Dual update rule:

• If

Algorithm Pseudocode

Novikoff Theorem• Convergence bound for hard-margin case

• If training points contained in ball of radius R around origin

• w* hard margin svm with no bias and geometric margin gamma

• Initial weight:

• Number of updates bounded by:

Proof• From 2 inequalities:

• Putting these together we have:

• Which leads to bound:

Kernel Adatron• Simple modification to perceptron, models hard margin

SVM with 0 thresholdalpha stops changing, either alpha positive and right term 0, or right term negative

Kernel Adatron – Soft Margin• 1-norm soft margin version

• Add upper bound to the values of alpha (C)

• 2-norm soft margin version

• Add constant to diagonal of kernel matrix

• SMO

• To allow a variable threshold, updates must be made on pair of examples at once

• Results in SMO

• Rate of convergence both algs. sensitive to order

• Good heuristics, e.g. choose points most violate conditions first

On-line regression• Also works for regression case

• Basic gradient ascent with additional constraints

Online SVR

Questions•Questions, Comments?

Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion

Documents

Lecture Notes 7: Convex Optimization - NYU Courantcfgranda/pages/OBDA_fall17/notes/convex...Optimization-based data analysis Fall 2017 Lecture Notes 7: Convex Optimization 1 Convex

Convex Optimization for Big Data - UBC Computer Science · Convex FunctionsSmooth OptimizationNon-Smooth OptimizationRandomized AlgorithmsParallel/Distributed Optimization Convex

New Lecture Notes 7: Convex Optimization - New York University · 2017. 11. 16. · Optimization-based data analysis Fall 2017 Lecture Notes 7: Convex Optimization 1 Convex functions

M. Tech. in Power System Engineering Academic … · Web viewConvex Optimization: Convex Sets, Convex Functions, Convex Optimization Problems, Unconstrained minimization, Equality

ROBUST CONVEX OPTIMIZATION - CiteSeer

Convex Optimization - IIT Bombaycs709/notes/BasicsOfConvexOptimization.pdfChapter 4 Convex Optimization 4.1 Introduction 4.1.1 Mathematical Optimization The problem of mathematical

Introduction to Convex Optimization - CityU CScheewtan/CS8292Class/Lec1.pdf · Introduction to Convex Optimization Chee Wei Tan CS8292 : Advanced Topics in Convex Optimization and

Convex Optimization Part II

Convex Optimization for Data Science · Introduction Lectures on Convex Optimization. A Basic Course. Applied Optimization. – Springer, 2004. Nemirovski A. Lectures on modern convex

convex optimization solvers modeling systems disciplined ... · EE364a Review Disciplined Convex Programming and CVX • convex optimization solvers • modeling systems • disciplined

Understanding Non-convex Optimization - Praneeth Netrapalli€¦ · •Convex optimization ()is a convex function, 𝒞is convex set •ut “today’s problems”, and this tutorial,

Convex Analysis and Optimization

ects.ogu.edu.tr°statistik Doktora... · Web viewLeast-squares and linear programming 2 Convex optimization 3 Affine and convex sets 4 Convex functions 5 Convex optimization problems

Convex Optimization and Modeling - Saarland University€¦ · Convex Optimization and Modeling Convex Optimization Fourth lecture, 05.05.2010 Jun.-Prof. Matthias Hein

sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

5. Smooth convex optimization

Practical Session on Convex Optimization: Convex Analysisschmidtm/MLSS/convex.pdf · Practical Session on Convex Optimization: Convex Analysis Mark Schmidt INRIA/ENS September 2011

Convex Optimization - Lecture Slides

Convex Approximation and Optimization with Applications in ... · Convex Approximation and Optimization with Applications in Magnitude Filter Design and Radiation Pattern Synthesis

ROBUST CONVEX OPTIMIZATION - Atlanta, GAnemirovs/robmp.pdfROBUST CONVEX OPTIMIZATION A. BEN-TAL AND A. NEMIROVSKI We study convex optimization problems for which the data is not speci