EL HE - math.ucsd.edursaab/teaching/2019Fall173A/Math 173A lecture notes 4.pdfHak xu50286 Lx At a minimum we expect that Of 0 because the right hand side is convex so taking 0 on both

Back to GD omm

We've seen that

11 67 at HER1086011 EL HE x f E Rftf convex when µ Ereso for an error EE we require a iterations

can we do better if we assume moreabout f

i

ii iT

in

We won't prove this theorem but a keyelement in its proof is the fact thatfor L smooth convex functions wehave thatfly E floc Of g x 1 Elly all

With L smoothnees need L iterationsto get E error

fommodifyGDtogetprojectedGDfortsmoothfunctionwithesameconvergencerathensduing

Ffs FGC with Rcpi corvee

Picking µ in Practicemm

The convergence theorems we've seenso far require knowing the Lipschitzconstant or smoothness constantassociated with f Gc

inordest

This is not always possible in practiceIdeas Use the best ult at

every iteration

x Ltt D Ct E pg sect

pi Ek att to minimize

niem.ua ae.eeshI

the variable we're optimiginghereis ult

Often solving this exactly is hardso we settle for an approximatesolution

Possible Solutions set µ usinga Backtrackinghinesearchat every iteration it

Heye for any descent direction p

floc 0865 p E f x p E f x 8 0865pp n

7f isconvetpIs a descent dir

So there is a small constat870 that make this ineq true

Fake p uTfGD as in GDand plug it into Ix

So we expect that if µ is small

enoughf x µ0fGD s 864 full Of HEIdea Fix 8 say 0.5start with e.g µ Id

Esifesaennetoset's'sis smaller than flocks 8m78 T

e.gs µ I µ 0.8 u 0.82

Example by picture

869

i i i lI i f Il l I k entdirectionn I i

i i Iyagrad

K sect 0.82786kt

Loe It Is o 8086

year AGED M o8Now we're

4 gooda is 8GE OfGE'D wetrgm o.seis too big still too big

bigger thanfGe't 1108647112

More Precisely 3

TTarcknglinesearctTPick Bst 851At each GD step t e

c Set v O flock2 Set µ I

flock moffat'D Effit µ 811086MtThenkeep µ

Elseset u pm 4 Repeat

Remark e The condition

flock not Git'D Effit µ 811886Mt

is known as the Armijoconditionand guarantees that the functionValue decreases by some non Zeroamount

Examp les Consider the functionf R2 IR with

f 04 xD Gc 1 4 0Gt Xz 1which has minimizer Ho

Of 464 173 264 22 11 ith DSuppose od o o

xd I 2 04 0 Oso M EG 2

Want to pickle to minimize 84901mfs 4

Hbo C 6 4 f Gu 2M 64 1 8M 15

This is a non linear egg c in mthat be hard to minimize

Then let's run Backtracking Linesearchwith 8 0.5 P 0.8

xd xO

µ Off yoOso M EG 2

So we try z as40

I y bb

floc 674 86401 µ 8118 112

JxG2µ Of 18

If x 237.67367 142M o 82 869 n 8118 11

f same issue

finally i

M 0.8 givesf x 011530 f

f GE 0,5 0,8 1188697112

So we choose µ 0,8

and set at's o 0,8 Off 0,00,5154 011718

Then we repeat the process forX i

Goo

o g7S 0,0211

Forty nine e for an 2 smooth convex

function with Mlb set by backtrackingline search as above we have

u

sGME2t mand sniffe n's 3 min 1 Bk

so we get 8Gt 868 E TapMore can be said when we

assume more about the functione g strong convexitygMore can be said about line searchmethods

We may return to these topicslater if time permits

Newton's method2

So far we have used GDwhich was derived from the1st order Taylor approximationfloc reflects Offset x SED

want want this as negthis small as possible

pick D IMAGEDto get floc

D flocks µ HOGGED112

If instead we use a 2ndorder Taylor approximationwe obtain Newton's method

2

floc floats OffectDTGc ItHak xu50286 Lx

At a minimum we expect thatOf 0 because the right hand sideis convex

so taking 0 on both sides

Of a Iot Of Cocks OfGED sit

x xE 028647 1086t

at the minimizer so we set

x D

xH_p2fGEDJyf xtmTExampleg Let Goo RI R be

guier by AGOx hr Gc

Then Of e f GD I Izp2f x f x Ez

Newton's method starting at 20 0.5generates the iterates

x't act xlt J f'GCHx El GETZ I

see fit set 24 45x 2207 012 I 025 0,75

X E 0 9375D O i 9961IN O i 9998

note that the optimum is x I

If this seems fast it is notacoincidence

Defino for a matrix µ

11m11 Max HMI lengthyto 11041 length of

HUH measures how much thematrix U can stretch rectors

consequence of the definethe 11M ZHE HMHHEH

engtefenz Inouflagsthuis

FTTTeahaffoeffestepigosEftatx't has Tf so

Suppose further that

11 ZfGE Il f f for some h o

1102floc 0286 311 EL Dx Illfor all x

we have e Il x 2 11 f 23 t t

y lis EME's Elf2h

Ht

LooseInterpretations If we start close

enough to a local minimum and thefunction is nice we convergequickly to the minimum

ExceptWant to minimizefloc soft 2xixE sciusing Newton's method

need PHX Ff Ge

Tf Cx 4343 40405405 4 3

own SEE 4 47

suppose we start at od Li D

Then x xd JOGGED

L't l's i It2 3

61Continuing this wayIt 7,3 FI

exponentially hastint

Example

fled 341 set 2kt I

start at odo's 0

Of a 23 2 2 Jfk 322 2

od xo GIG Tf GED

o C 25 2I

od x f x d

I I I

O xo

back to Koil so we entereda cycleSo Newton's method neednot always converge

why does this not contradictthe theorem

Some Remarks on Newton's Method

1 We can modify Newton'smethodto include a step sigeo I

eat D It ult 286ft flock

can choose fixedcan use backtrackingline search

2 Finding the inverse of theHessian can be very expensiveif n is large

Instead notice thata D att 2ffit J Offutt024 7 felt E OHHHT r t t

known known knownSone for this

So we can use lunar algebratechniques to solve forLt 3 from this system

3 Newton's method when itconverges tends to do somuch faster than GD

compare theconvergence

theorems

Quasi Newton Methodsv

very brieflyRecall God has the interpretation

flood flocks t Offutt'T x att0 DX x km2

minimizing the right hand side

gives xD H_ Pf zL

Meanwhile Newtonisthodfloes x flock Tf F x accts

4 x 50286M alt

minimizing the right hand side

gives xD

H_jfGfOf xLtSo we can think of GD as

approximating the Hessianwith

tIdentity matrix

Quasi Newton Methods approximate

the Hessian with some othermatrix Be which changes fromiteration to iteration so that

act D x Ct ht BI jffsettThere are several such methodswith different choices for BtWe won't cover them herebut examples are 3

B FGS method

Brogdeneee

Documents

EL HE - math.ucsd.edursaab/teaching/2019Fall173A/Math 173A lecture notes 4.pdfHak xu50286 Lx At a minimum we expect that Of 0 because the right hand side is convex so taking 0 on both