Argonne National Laboratory · Argonne National Laboratory Exascale Computing Project Annual Meeting Houston, TX January 14, 2019 1/40.pdf Outline The Basics Unconstrained Optimization

.pdf

Numerical Optimization with PETSc/TAO

Alp Dener Todd Munson

Mathematics and Computer Science DivisionArgonne National Laboratory

Exascale Computing Project Annual MeetingHouston, TX January 14, 2019

1 / 40

.pdf

Outline

The BasicsUnconstrained OptimizationConvex ExampleNonconvex Example

GlobalizationLocally Convergent AlgorithmTrust-RegionLine Search

AlgorithmsBound ConstraintsFirst-order MethodsSecond-order MethodsHow to choose?

Simulation ConstraintsReduced-Space ApproachEvaluating Derivatives

2 / 40

.pdf

Unconstrained OptimizationThe Basics

I Generic optimization problem

minx

f (x)

I x ∈ <n are the optimization variablesI f : <n → < is the objective function

I Problem classificationI Differentiability: Discontinuous, C0, C1, C2, . . .I Topology: convex vs. nonconvexI Type: linear, quadratic, nonlinear, etc.

3 / 40

.pdf


I Generic optimization problem

minx

f (x)

I x ∈ <n are the optimization variablesI f : <n → < is the objective function

I Problem classification

I Differentiability: Discontinuous, C0, C1, C2 , . . .

I Topology: convex vs. nonconvex

I Type: linear, quadratic, nonlinear , etc.

4 / 40

.pdf


minx

f (x)

I Solve ∇f (x∗) = 0 (first-order necessary condition)

I x∗ is a global minimum if f (x∗) ≤ f (x) ∀ x ∈ <n

I Global solutions need not be uniqueI All global solutions have the same objective value

I x∗ is a local minimum if f (x∗) ≤ f (x) ∀ x ∈ N (x∗)where N (x∗) is a neighborhood of x∗

I Local solutions need not be uniqueI Local solutions need not have the same objective value

5 / 40

.pdf

TAO Example: Rosenbrock FunctionThe Basics

minx,y

f (x , y) = (1− x)2 + 99(y − x2)2

6 / 40

.pdf


P e t s c E r r o r C o d e F o r m F u n c t i o n G r a d i e n t ( Tao tao , Vec X, P e t s c R e a l ∗f , Vec G, v o i d ∗p t r )

P e t s c E r r o r C o d e i e r r ;P e t s c R e a l f f =0;P e t s c S c a l a r ∗g ;c o n s t P e t s c S c a l a r ∗x ;

P e t s c F u n c t i o n B e g i n U s e r ;

/∗ Get p o i n t e r s to v e c t o r data ∗/i e r r = VecGetArrayRead (X,& x ) ;CHKERRQ( i e r r ) ;i e r r = VecGetArray (G,&g ) ;CHKERRQ( i e r r ) ;

/∗ Compute f u n c t i o n v a l u e and g r a d i e n t ∗/f f = (1−x [0])(1− x [ 0 ) + 99∗( x [1]−x [ 0 ]∗ x [ 0 ] )∗ ( x [1]−x [ 0 ]∗ x [ 0 ] ) ;g [ 0 ] = −2∗(1−x [ 0 ] ) − 4∗99∗(x [1]−x [ 0 ]∗ x [ 0 ] )∗ x [ 0 ] ;g [ 1 ] = 2∗99∗(x [ 1 ] − x [ 0 ]∗ x [ 0 ] ) ;

/∗ R e s t o r e v e c t o r s ∗/i e r r = VecRestoreAr rayRead (X,& x ) ;CHKERRQ( i e r r ) ;i e r r = V e c R e s t o r e A r r a y (G,&g ) ;CHKERRQ( i e r r ) ;∗ f = f f ;

P e t s c F u n c t i o n R e t u r n ( 0 ) ;

7 / 40

.pdf


i n t main ( i n t argc , c h a r ∗∗a r g v )

. . .

i e r r = P e t s c I n i t i a l i z e (& argc ,& argv , ( c h a r ∗)0 , h e l p ) ; i f ( i e r r ) r e t u r n i e r r ;

. . .

/∗ C r e a t e TAO s o l v e r w i t h d e s i r e d s o l u t i o n method ∗/i e r r = TaoCreate (PETSC COMM SELF , &tao ) ;CHKERRQ( i e r r ) ;i e r r = TaoSetType ( tao , TAOBQNLS ) ;CHKERRQ( i e r r ) ;

/∗ Set s o l u t i o n vec and i n i t i a l g u e s s ∗/i e r r = VecSet ( x , z e r o ) ;CHKERRQ( i e r r ) ;i e r r = T a o S e t I n i t i a l V e c t o r ( tao , x ) ; CHKERRQ( i e r r ) ;

/∗ Set u s e r r o u t i n e f o r f u n c t i o n and g r a d i e n t e v a l u a t i o n ∗/i e r r = T a o S e t O b j e c t i v e A n d G r a d i e n t R o u t i n e ( tao , FormFunct ioNGradient , &u s e r ) ;CHKERRQ( i e r r ) ;

/∗ Check f o r command−l i n e o p t i o n s and s t a r t t h e s o l u t i o n ∗/i e r r = TaoSetFromOptions ( tao ) ;CHKERRQ( i e r r ) ;i e r r = TaoSolve ( tao ) ;CHKERRQ( i e r r ) ;

. . .

/∗ Cleanup TAO and PETSc ∗/i e r r = TaoDestroy (& tao ) ; CHKERRQ( i e r r ) ;i e r r = P e t s c F i n a l i z e ( ) ;

8 / 40

.pdf


$ cd s r c / tao / u n c o n s t r a i n e d / examples / t u t o r i a l s$ make r o s e n b r o c k 1$ . / r o s e n b r o c k 1 −t a o s m o n i t o r −t a o g a t o l 1e−4

i t e r = 0 , F u n c t i o n v a l u e 1 . , R e s i d u a l : 2 .i t e r = 1 , F u n c t i o n v a l u e 0 . 7 9 1 9 0 5 , R e s i d u a l : 3 .15898i t e r = 2 , F u n c t i o n v a l u e 0 . 7 3 5 2 7 2 , R e s i d u a l : 8 .61386i t e r = 3 , F u n c t i o n v a l u e 0 . 5 9 9 6 6 6 , R e s i d u a l : 2 . 6 9 5i t e r = 4 , F u n c t i o n v a l u e 0 . 4 7 1 9 8 2 , R e s i d u a l : 1 .38514i t e r = 5 , F u n c t i o n v a l u e 0 . 3 9 0 1 9 4 , R e s i d u a l : 4 .06039i t e r = 6 , F u n c t i o n v a l u e 0 . 3 1 3 9 0 1 , R e s i d u a l : 6 .38821i t e r = 7 , F u n c t i o n v a l u e 0 . 1 9 8 9 3 5 , R e s i d u a l : 0 .65438i t e r = 8 , F u n c t i o n v a l u e 0 . 1 5 9 6 6 4 , R e s i d u a l : 2 .29449i t e r = 9 , F u n c t i o n v a l u e 0 . 1 3 0 5 6 5 , R e s i d u a l : 3 .76042i t e r = 10 , F u n c t i o n v a l u e 0 . 0 8 6 3 5 4 8 , R e s i d u a l : 4 .34492i t e r = 11 , F u n c t i o n v a l u e 0 . 0 4 6 8 2 6 8 , R e s i d u a l : 0 .341378i t e r = 12 , F u n c t i o n v a l u e 0 . 0 2 9 8 5 5 5 , R e s i d u a l : 2 .55348i t e r = 13 , F u n c t i o n v a l u e 0 . 0 1 9 0 1 9 1 , R e s i d u a l : 2 .93464i t e r = 14 , F u n c t i o n v a l u e 0 .0 06455 37 , R e s i d u a l : 0 .526141i t e r = 15 , F u n c t i o n v a l u e 0 .0 03737 02 , R e s i d u a l : 2 .4349i t e r = 16 , F u n c t i o n v a l u e 0 .0 01493 41 , R e s i d u a l : 0 .164182i t e r = 17 , F u n c t i o n v a l u e 0 .000622109 , R e s i d u a l : 0 .167388i t e r = 18 , F u n c t i o n v a l u e 3 .7352 e−05, R e s i d u a l : 0 .233549i t e r = 19 , F u n c t i o n v a l u e 6 .65726 e−06, R e s i d u a l : 0 .102553i t e r = 20 , F u n c t i o n v a l u e 5 .1701 e−09, R e s i d u a l : 0 .00232561i t e r = 21 , F u n c t i o n v a l u e 2 .16521 e−12, R e s i d u a l : 5 .29036 e−05

9 / 40

.pdf

Convex ExampleThe Basics

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0x

−2

0

2

4

6

8

10

y

f (x) = (x − 0.5)4 + (x + 0.5)2 − 0.5

I f : <n → < is convex iff (y) ≥ f (x) +∇f (x)T (y − x) ∀ x , y ∈ <n

10 / 40

.pdf

Convex ExampleThe Basics

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0x

−2

0

2

4

6

8

10

y

f (x) = (x − 0.5)4 + (x + 0.5)2 − 0.5

I Global minimum:∇f (x) = 4x(x − 0.5)3 + 2x(x + 0.5)− 0.5 = 0

11 / 40

.pdf

Nonconvex ExampleThe Basics

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x

−6

−4

−2

0

2

4

6

y

f (x) = −x4 + 5x2 + x − 4

I Portions of function lie below the tangent line

12 / 40

.pdf


−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x

−6

−4

−2

0

2

4

6

y

f (x) = −x4 + 5x2 + x − 4

I Stationary points: ∇f (x) = −4x3 + 10x + 1 = 0

I Local maximizers: ∇2f (x) = −12x2 + 10 < 0

I Local minimizer: ∇2f (x) = −12x2 + 10 > 0

13 / 40

.pdf


−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x

−6

−4

−2

0

2

4

6

y

f (x) = −x4 + 5x2 + x − 4

I ∇f (x∗) = 0 condition is necessary but not sufficient

I ∇2f (x∗) must be positive definite at the solutioni.e.: yT ∇2f (x∗) y > 0 ∀ y 6= 0

14 / 40

.pdf

Locally Convergent AlgorithmGlobalization

I Most good algorithms are variants of Newton’s Method

I Attempt to compute local minimizers for f : <n → <1. Form Taylor series approximation around xk

f (x) ≈ f (xk) +∇f (xk)T (x − xk) +1

2(x − xk)T∇2f (xk)(x − xk)

2. Solve quadratic optimization problem for dk

mindk

f (xk) +∇f (xk)Tdk +1

2dTk ∇2f (xk)dk

I Convex case solutions satisfy

∇2f (xk )dk = −∇f (xk )

I Nonconvex case can be unbounded,use trust-region or line search methods

3. Update iterate and repeat until convergence

15 / 40

.pdf

Trust-RegionGlobalization

1. Initialize trust radius ∆

2. Solve trust-region subproblem

mindk

f (xk) + dTk ∇f (xk) +

1

2dTk H(xk)dk

subject to ‖dk‖2 ≤ ∆

3. Compute predicted decrease δfpred = dTk ∇f (xk) + 1

2dTk H(xk)dk

4. Compute actual decrease δfactual = f (xk + dk)− f (xk)

5. Accept iterate xk+1 = xk + dk if δfactual ≥ δfpred6. Update trust radius

I Increase ∆ if δfactual >> δfpredI Decrease ∆ if δfactual < δfpred

16 / 40

.pdf

Illustration on Nonconvex ProblemGlobalization

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x

−6

−4

−2

0

2

4

6

yf (x) = −x4 + 5x2 + x − 4

17 / 40

.pdf


−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x

−6

−4

−2

0

2

4

6

yf (x) = −x4 + 5x2 + x − 4

18 / 40

.pdf


−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x

−6

−4

−2

0

2

4

6

yf (x) = −x4 + 5x2 + x − 4

19 / 40

.pdf


−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x

−6

−4

−2

0

2

4

6

yf (x) = −x4 + 5x2 + x − 4

20 / 40

.pdf


−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x

−6

−4

−2

0

2

4

6y

f (x) = −x4 + 5x2 + x − 4

21 / 40

.pdf


−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x

−6

−4

−2

0

2

4

6y

f (x) = −x4 + 5x2 + x − 4

22 / 40

.pdf


−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x

−6

−4

−2

0

2

4

6y

f (x) = −x4 + 5x2 + x − 4

23 / 40

.pdf


−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x

−6

−4

−2

0

2

4

6y

f (x) = −x4 + 5x2 + x − 4

24 / 40

.pdf


−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x

−6

−4

−2

0

2

4

6y

f (x) = −x4 + 5x2 + x − 4

25 / 40

.pdf

Line SearchGlobalization

1. Initialize perturbation λk to zero

2. Solve perturbed quadratic model

mindk

f (xk) + dTk ∇f (xk) +

1

2dTk [H(xk) + λk I ] dk

3. Search f (xk + αkdk) for an appropriate step length αkI More-Thuente line search

I Satisfy strong Wolfe conditions

f (xk + αkdk ) ≤ f (xk ) + σαk∇f (xk )Tdk

|∇f (xk + αkdk )Tdk | ≤ δ|∇f (xk )Tdk |

I Trial step lengths from cubic interpolation

4. Update iterateI If valid αk found, accept xk+1 = xk + αkdk and decrease λk

I If no valid αk , increase λk

26 / 40

.pdf

Line SearchGlobalization

27 / 40

.pdf

Bound ConstraintsAlgorithms

minx

f (x)

subject to bl ≤ x ≤ bu

I bl , bu ∈ <n are bounds on the optimization variables

I Also called “box” constraints

I Active-set approachI Compute a search direction only for variables inside the boundsI Snap variables outside the bounds to the boundsI Project trial steps into the bounds during globalization

I Bounded algorithms in TAO also solve unconstrained problems

28 / 40

.pdf

First-order MethodsAlgorithms

I Only requires ∇f (x) to be available

I The Hessian ∇2f (x) is approximated

I Objective function must be C1-continuous

I TAO solversI BNCG – Bounded Nonlinear Conjugate Gradient

I Steepest descent – do not use!I Classic methods: Fletcher-Reeves, Polak-Ribiere-Polyak,

Hestenes-StiefelI Modern methods: SSML-BFGS, Dai-Yuan, Hager-Zhang, Dai-Kou,

Kou-DaiI Nonlinear preconditioning with diagonalized quasi-Newton

I BQNLS – Bounded Quasi-Newton Line SearchI Limited-memory matrix-free implementationI Restricted Broyden family of methods (default: BFGS)I Sparse diagonal Hessian initialization

29 / 40

.pdf

Second-order MethodsAlgorithms

I Requires both ∇f (x) and ∇2f (x)

I Objective function must be C2-continuous

I TAO solversI BNLS - Bounded Newton Line SearchI BNTR - Bounded Newton Trust-RegionI BNTL - Bounded Newton Trust-Region w/ Line Search Fallback

I Iteratively invert ∇2f (x) using PETSc KSP methodsI BNTR and BNTL use trust-region Krylov methods

(e.g.: Steihaug-Toint Conjugate Gradient)I Preconditioned with a quasi-Newton approximation to ∇2f (x)I PETSc preconditioners are also available (e.g.: ICC)

30 / 40

.pdf

Providing Hessians in TAOAlgorithms

P e t s c E r r o r C o d e FormHess ian ( Tao tao , Vec X, Mat H, Mat Hpre , v o i d ∗p t r )

. . .

i n t main ( i n t argc , c h a r ∗∗a r g v )

. . .

i e r r = T a o S e t H e s s i a n R o u t i n e ( tao , H, Hpre , FormHessian , &u s e r ) ;CHKERRQ( i e r r ) ;

. . .

I Evaluate the Hessian (H) and its preconditioner (Hpre) at the point X

I Hpre and H can be the same Mat object

31 / 40

.pdf

How to choose?Algorithms

Second-order methods:

I Achieve solution with few nonlinear iterations

I Can efficiently converge to a tighter tolerance

I May be slower than first-order methods if Hessian computation isnot efficient

First-order methods:

I Small memory footprint, suitable for large-scale problems

I Avoid expensive and challenging Hessian computation

I May take many nonlinear iterations and require many function andgradient evaluations

32 / 40

.pdf

Switching Solvers in TAOAlgorithms

$ . / r o s e n b r o c k 1 −t a o m o n i t o r −t a o t y p e bncg −t a o b n c g t y p e s s m l b f g s$ . / r o s e n b r o c k 1 −t a o m o n i t o r −t a o t y p e b q n l s −t a o b q n l s m a t l m v m n u m v e c s 5$ . / r o s e n b r o c k 1 −t a o m o n i t o r −t a o t y p e b n l s −t a o b n k p c t y p e i c c

I Switch solvers with-tao type <bncg, bqnls, bnls, ...>

I Switch CG types with-tao bncg type <gd, fr, hz, ssml bfgs, ...>

I Control limited-memory quasi-Newton history size with-tao bqnls mat lmvm num vecs N

I Change Newton-Krylov preconditioner type with-tao bnk pc type <ilu, icc, lmvm, ...>

I See the PETSc/TAO manual pages online for complete list ofalgorithm options

33 / 40

.pdf

Performance Profiles: First-orderAlgorithms

100 101

π

0.0

0.2

0.4

0.6

0.8

1.0

Pc(r

p,c≤

π:c∈C)

L-BFGS

SSML-BFGS

Steepest Desc.

34 / 40

.pdf

Performance Profiles: Second-orderAlgorithms

100 101

π

0.0

0.2

0.4

0.6

0.8

1.0

Pc(r

p,c≤

π:c∈C)

BNLS

BNTR

BNTL

35 / 40

.pdf

Performance Profiles: VersusAlgorithms

100 101 102

π

0.0

0.2

0.4

0.6

0.8

1.0

Pc(r

p,c≤

π:c∈C)

BNLS

L-BFGS

SSML-BFGS

36 / 40

.pdf

Performance Profiles: VersusAlgorithms

‘

10−2 10−1 100 101

CPU time (s)

0.0

0.2

0.4

0.6

0.8

1.0

%ofproblemssolved

BNLS

L-BFGS

SSML-BFGS

37 / 40

.pdf

TAO Example: The Obstacle ProblemAlgorithms

minu

∫Ω

|∇u|2dx

subject to u(x) ≥ Φ(x) ∀ x ∈ Ω

u(x) = 0 ∀ x ∈ dΩ

I Models stretching a weightlessmembrane over a rigid obstacledefined by Φ(x)

I Implemented with MFEM andsolved with PETSc/TAO

I Online workbook: https:

//xsdk-project.github.io/

ATPESC2018HandsOnLessons/

lessons/obstacle_tao/

38 / 40

https://xsdk-project.github.io/ATPESC2018HandsOnLessons/lessons/obstacle_tao/




.pdf

Reduced-Space ApproachSimulation Constraints

minx,u

f (x , u)

subject to R(x , u) = 0

I u ∈ <m are state variables

I R : <n ×<m → <m are the state equations that represent thesimulation constraint (e.g.: a discretized partial differential equation)

I Invoke implicit function theorem to eliminate the simulationconstraint

minx

f (x , u(x))

where u(x) is defined by solving R(x , u) = 0 for a given x

39 / 40

.pdf

Evaluating DerivativesSimulation Constraints

minx

f (x , u(x))

I Evaluating f (xk , u(xk)) requires solving R(xk , uk) = 0 for uk ≡ u(xk)

I The gradient ∇f (x , u) is a total derivative

∇f (x , u) ≡ df

dx=∂f

∂x+∂f

∂u

du

dx

where ∂()/∂x and ∂()/∂u denote partial derivatives

I This total derivative can be efficiently computed using theadjoint method

40 / 40

Documents

Argonne National Laboratory · Argonne National Laboratory Exascale Computing Project Annual Meeting Houston, TX January 14, 2019 1/40.pdf Outline The Basics Unconstrained Optimization