Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Numerical Optimization with PETSc/TAO
Alp Dener Todd Munson
Mathematics and Computer Science DivisionArgonne National Laboratory
Exascale Computing Project Annual MeetingHouston, TX January 14, 2019
1 / 40
Outline
The BasicsUnconstrained OptimizationConvex ExampleNonconvex Example
GlobalizationLocally Convergent AlgorithmTrust-RegionLine Search
AlgorithmsBound ConstraintsFirst-order MethodsSecond-order MethodsHow to choose?
Simulation ConstraintsReduced-Space ApproachEvaluating Derivatives
2 / 40
Unconstrained OptimizationThe Basics
I Generic optimization problem
minx
f (x)
I x ∈ <n are the optimization variablesI f : <n → < is the objective function
I Problem classificationI Differentiability: Discontinuous, C0, C1, C2, . . .I Topology: convex vs. nonconvexI Type: linear, quadratic, nonlinear, etc.
3 / 40
Unconstrained OptimizationThe Basics
I Generic optimization problem
minx
f (x)
I x ∈ <n are the optimization variablesI f : <n → < is the objective function
I Problem classification
I Differentiability: Discontinuous, C0, C1, C2 , . . .
I Topology: convex vs. nonconvex
I Type: linear, quadratic, nonlinear , etc.
4 / 40
Unconstrained OptimizationThe Basics
minx
f (x)
I Solve ∇f (x∗) = 0 (first-order necessary condition)
I x∗ is a global minimum if f (x∗) ≤ f (x) ∀ x ∈ <n
I Global solutions need not be uniqueI All global solutions have the same objective value
I x∗ is a local minimum if f (x∗) ≤ f (x) ∀ x ∈ N (x∗)where N (x∗) is a neighborhood of x∗
I Local solutions need not be uniqueI Local solutions need not have the same objective value
5 / 40
TAO Example: Rosenbrock FunctionThe Basics
minx,y
f (x , y) = (1− x)2 + 99(y − x2)2
6 / 40
TAO Example: Rosenbrock FunctionThe Basics
P e t s c E r r o r C o d e F o r m F u n c t i o n G r a d i e n t ( Tao tao , Vec X, P e t s c R e a l ∗f , Vec G, v o i d ∗p t r )
P e t s c E r r o r C o d e i e r r ;P e t s c R e a l f f =0;P e t s c S c a l a r ∗g ;c o n s t P e t s c S c a l a r ∗x ;
P e t s c F u n c t i o n B e g i n U s e r ;
/∗ Get p o i n t e r s to v e c t o r data ∗/i e r r = VecGetArrayRead (X,& x ) ;CHKERRQ( i e r r ) ;i e r r = VecGetArray (G,&g ) ;CHKERRQ( i e r r ) ;
/∗ Compute f u n c t i o n v a l u e and g r a d i e n t ∗/f f = (1−x [0])(1− x [ 0 ) + 99∗( x [1]−x [ 0 ]∗ x [ 0 ] )∗ ( x [1]−x [ 0 ]∗ x [ 0 ] ) ;g [ 0 ] = −2∗(1−x [ 0 ] ) − 4∗99∗(x [1]−x [ 0 ]∗ x [ 0 ] )∗ x [ 0 ] ;g [ 1 ] = 2∗99∗(x [ 1 ] − x [ 0 ]∗ x [ 0 ] ) ;
/∗ R e s t o r e v e c t o r s ∗/i e r r = VecRestoreAr rayRead (X,& x ) ;CHKERRQ( i e r r ) ;i e r r = V e c R e s t o r e A r r a y (G,&g ) ;CHKERRQ( i e r r ) ;∗ f = f f ;
P e t s c F u n c t i o n R e t u r n ( 0 ) ;
7 / 40
TAO Example: Rosenbrock FunctionThe Basics
i n t main ( i n t argc , c h a r ∗∗a r g v )
. . .
i e r r = P e t s c I n i t i a l i z e (& argc ,& argv , ( c h a r ∗)0 , h e l p ) ; i f ( i e r r ) r e t u r n i e r r ;
. . .
/∗ C r e a t e TAO s o l v e r w i t h d e s i r e d s o l u t i o n method ∗/i e r r = TaoCreate (PETSC COMM SELF , &tao ) ;CHKERRQ( i e r r ) ;i e r r = TaoSetType ( tao , TAOBQNLS ) ;CHKERRQ( i e r r ) ;
/∗ Set s o l u t i o n vec and i n i t i a l g u e s s ∗/i e r r = VecSet ( x , z e r o ) ;CHKERRQ( i e r r ) ;i e r r = T a o S e t I n i t i a l V e c t o r ( tao , x ) ; CHKERRQ( i e r r ) ;
/∗ Set u s e r r o u t i n e f o r f u n c t i o n and g r a d i e n t e v a l u a t i o n ∗/i e r r = T a o S e t O b j e c t i v e A n d G r a d i e n t R o u t i n e ( tao , FormFunct ioNGradient , &u s e r ) ;CHKERRQ( i e r r ) ;
/∗ Check f o r command−l i n e o p t i o n s and s t a r t t h e s o l u t i o n ∗/i e r r = TaoSetFromOptions ( tao ) ;CHKERRQ( i e r r ) ;i e r r = TaoSolve ( tao ) ;CHKERRQ( i e r r ) ;
. . .
/∗ Cleanup TAO and PETSc ∗/i e r r = TaoDestroy (& tao ) ; CHKERRQ( i e r r ) ;i e r r = P e t s c F i n a l i z e ( ) ;
8 / 40
TAO Example: Rosenbrock FunctionThe Basics
$ cd s r c / tao / u n c o n s t r a i n e d / examples / t u t o r i a l s$ make r o s e n b r o c k 1$ . / r o s e n b r o c k 1 −t a o s m o n i t o r −t a o g a t o l 1e−4
i t e r = 0 , F u n c t i o n v a l u e 1 . , R e s i d u a l : 2 .i t e r = 1 , F u n c t i o n v a l u e 0 . 7 9 1 9 0 5 , R e s i d u a l : 3 .15898i t e r = 2 , F u n c t i o n v a l u e 0 . 7 3 5 2 7 2 , R e s i d u a l : 8 .61386i t e r = 3 , F u n c t i o n v a l u e 0 . 5 9 9 6 6 6 , R e s i d u a l : 2 . 6 9 5i t e r = 4 , F u n c t i o n v a l u e 0 . 4 7 1 9 8 2 , R e s i d u a l : 1 .38514i t e r = 5 , F u n c t i o n v a l u e 0 . 3 9 0 1 9 4 , R e s i d u a l : 4 .06039i t e r = 6 , F u n c t i o n v a l u e 0 . 3 1 3 9 0 1 , R e s i d u a l : 6 .38821i t e r = 7 , F u n c t i o n v a l u e 0 . 1 9 8 9 3 5 , R e s i d u a l : 0 .65438i t e r = 8 , F u n c t i o n v a l u e 0 . 1 5 9 6 6 4 , R e s i d u a l : 2 .29449i t e r = 9 , F u n c t i o n v a l u e 0 . 1 3 0 5 6 5 , R e s i d u a l : 3 .76042i t e r = 10 , F u n c t i o n v a l u e 0 . 0 8 6 3 5 4 8 , R e s i d u a l : 4 .34492i t e r = 11 , F u n c t i o n v a l u e 0 . 0 4 6 8 2 6 8 , R e s i d u a l : 0 .341378i t e r = 12 , F u n c t i o n v a l u e 0 . 0 2 9 8 5 5 5 , R e s i d u a l : 2 .55348i t e r = 13 , F u n c t i o n v a l u e 0 . 0 1 9 0 1 9 1 , R e s i d u a l : 2 .93464i t e r = 14 , F u n c t i o n v a l u e 0 .0 06455 37 , R e s i d u a l : 0 .526141i t e r = 15 , F u n c t i o n v a l u e 0 .0 03737 02 , R e s i d u a l : 2 .4349i t e r = 16 , F u n c t i o n v a l u e 0 .0 01493 41 , R e s i d u a l : 0 .164182i t e r = 17 , F u n c t i o n v a l u e 0 .000622109 , R e s i d u a l : 0 .167388i t e r = 18 , F u n c t i o n v a l u e 3 .7352 e−05, R e s i d u a l : 0 .233549i t e r = 19 , F u n c t i o n v a l u e 6 .65726 e−06, R e s i d u a l : 0 .102553i t e r = 20 , F u n c t i o n v a l u e 5 .1701 e−09, R e s i d u a l : 0 .00232561i t e r = 21 , F u n c t i o n v a l u e 2 .16521 e−12, R e s i d u a l : 5 .29036 e−05
9 / 40
Convex ExampleThe Basics
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0x
−2
0
2
4
6
8
10
y
f (x) = (x − 0.5)4 + (x + 0.5)2 − 0.5
I f : <n → < is convex iff (y) ≥ f (x) +∇f (x)T (y − x) ∀ x , y ∈ <n
10 / 40
Convex ExampleThe Basics
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0x
−2
0
2
4
6
8
10
y
f (x) = (x − 0.5)4 + (x + 0.5)2 − 0.5
I Global minimum:∇f (x) = 4x(x − 0.5)3 + 2x(x + 0.5)− 0.5 = 0
11 / 40
Nonconvex ExampleThe Basics
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x
−6
−4
−2
0
2
4
6
y
f (x) = −x4 + 5x2 + x − 4
I Portions of function lie below the tangent line
12 / 40
Nonconvex ExampleThe Basics
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x
−6
−4
−2
0
2
4
6
y
f (x) = −x4 + 5x2 + x − 4
I Stationary points: ∇f (x) = −4x3 + 10x + 1 = 0
I Local maximizers: ∇2f (x) = −12x2 + 10 < 0
I Local minimizer: ∇2f (x) = −12x2 + 10 > 0
13 / 40
Nonconvex ExampleThe Basics
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x
−6
−4
−2
0
2
4
6
y
f (x) = −x4 + 5x2 + x − 4
I ∇f (x∗) = 0 condition is necessary but not sufficient
I ∇2f (x∗) must be positive definite at the solutioni.e.: yT ∇2f (x∗) y > 0 ∀ y 6= 0
14 / 40
Locally Convergent AlgorithmGlobalization
I Most good algorithms are variants of Newton’s Method
I Attempt to compute local minimizers for f : <n → <1. Form Taylor series approximation around xk
f (x) ≈ f (xk) +∇f (xk)T (x − xk) +1
2(x − xk)T∇2f (xk)(x − xk)
2. Solve quadratic optimization problem for dk
mindk
f (xk) +∇f (xk)Tdk +1
2dTk ∇2f (xk)dk
I Convex case solutions satisfy
∇2f (xk )dk = −∇f (xk )
I Nonconvex case can be unbounded,use trust-region or line search methods
3. Update iterate and repeat until convergence
15 / 40
Trust-RegionGlobalization
1. Initialize trust radius ∆
2. Solve trust-region subproblem
mindk
f (xk) + dTk ∇f (xk) +
1
2dTk H(xk)dk
subject to ‖dk‖2 ≤ ∆
3. Compute predicted decrease δfpred = dTk ∇f (xk) + 1
2dTk H(xk)dk
4. Compute actual decrease δfactual = f (xk + dk)− f (xk)
5. Accept iterate xk+1 = xk + dk if δfactual ≥ δfpred6. Update trust radius
I Increase ∆ if δfactual >> δfpredI Decrease ∆ if δfactual < δfpred
16 / 40
Illustration on Nonconvex ProblemGlobalization
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x
−6
−4
−2
0
2
4
6
yf (x) = −x4 + 5x2 + x − 4
17 / 40
Illustration on Nonconvex ProblemGlobalization
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x
−6
−4
−2
0
2
4
6
yf (x) = −x4 + 5x2 + x − 4
18 / 40
Illustration on Nonconvex ProblemGlobalization
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x
−6
−4
−2
0
2
4
6
yf (x) = −x4 + 5x2 + x − 4
19 / 40
Illustration on Nonconvex ProblemGlobalization
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x
−6
−4
−2
0
2
4
6
yf (x) = −x4 + 5x2 + x − 4
20 / 40
Illustration on Nonconvex ProblemGlobalization
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x
−6
−4
−2
0
2
4
6y
f (x) = −x4 + 5x2 + x − 4
21 / 40
Illustration on Nonconvex ProblemGlobalization
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x
−6
−4
−2
0
2
4
6y
f (x) = −x4 + 5x2 + x − 4
22 / 40
Illustration on Nonconvex ProblemGlobalization
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x
−6
−4
−2
0
2
4
6y
f (x) = −x4 + 5x2 + x − 4
23 / 40
Illustration on Nonconvex ProblemGlobalization
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x
−6
−4
−2
0
2
4
6y
f (x) = −x4 + 5x2 + x − 4
24 / 40
Illustration on Nonconvex ProblemGlobalization
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0x
−6
−4
−2
0
2
4
6y
f (x) = −x4 + 5x2 + x − 4
25 / 40
Line SearchGlobalization
1. Initialize perturbation λk to zero
2. Solve perturbed quadratic model
mindk
f (xk) + dTk ∇f (xk) +
1
2dTk [H(xk) + λk I ] dk
3. Search f (xk + αkdk) for an appropriate step length αkI More-Thuente line search
I Satisfy strong Wolfe conditions
f (xk + αkdk ) ≤ f (xk ) + σαk∇f (xk )Tdk
|∇f (xk + αkdk )Tdk | ≤ δ|∇f (xk )Tdk |
I Trial step lengths from cubic interpolation
4. Update iterateI If valid αk found, accept xk+1 = xk + αkdk and decrease λk
I If no valid αk , increase λk
26 / 40
Line SearchGlobalization
27 / 40
Bound ConstraintsAlgorithms
minx
f (x)
subject to bl ≤ x ≤ bu
I bl , bu ∈ <n are bounds on the optimization variables
I Also called “box” constraints
I Active-set approachI Compute a search direction only for variables inside the boundsI Snap variables outside the bounds to the boundsI Project trial steps into the bounds during globalization
I Bounded algorithms in TAO also solve unconstrained problems
28 / 40
First-order MethodsAlgorithms
I Only requires ∇f (x) to be available
I The Hessian ∇2f (x) is approximated
I Objective function must be C1-continuous
I TAO solversI BNCG – Bounded Nonlinear Conjugate Gradient
I Steepest descent – do not use!I Classic methods: Fletcher-Reeves, Polak-Ribiere-Polyak,
Hestenes-StiefelI Modern methods: SSML-BFGS, Dai-Yuan, Hager-Zhang, Dai-Kou,
Kou-DaiI Nonlinear preconditioning with diagonalized quasi-Newton
I BQNLS – Bounded Quasi-Newton Line SearchI Limited-memory matrix-free implementationI Restricted Broyden family of methods (default: BFGS)I Sparse diagonal Hessian initialization
29 / 40
Second-order MethodsAlgorithms
I Requires both ∇f (x) and ∇2f (x)
I Objective function must be C2-continuous
I TAO solversI BNLS - Bounded Newton Line SearchI BNTR - Bounded Newton Trust-RegionI BNTL - Bounded Newton Trust-Region w/ Line Search Fallback
I Iteratively invert ∇2f (x) using PETSc KSP methodsI BNTR and BNTL use trust-region Krylov methods
(e.g.: Steihaug-Toint Conjugate Gradient)I Preconditioned with a quasi-Newton approximation to ∇2f (x)I PETSc preconditioners are also available (e.g.: ICC)
30 / 40
Providing Hessians in TAOAlgorithms
P e t s c E r r o r C o d e FormHess ian ( Tao tao , Vec X, Mat H, Mat Hpre , v o i d ∗p t r )
. . .
i n t main ( i n t argc , c h a r ∗∗a r g v )
. . .
i e r r = T a o S e t H e s s i a n R o u t i n e ( tao , H, Hpre , FormHessian , &u s e r ) ;CHKERRQ( i e r r ) ;
. . .
I Evaluate the Hessian (H) and its preconditioner (Hpre) at the point X
I Hpre and H can be the same Mat object
31 / 40
How to choose?Algorithms
Second-order methods:
I Achieve solution with few nonlinear iterations
I Can efficiently converge to a tighter tolerance
I May be slower than first-order methods if Hessian computation isnot efficient
First-order methods:
I Small memory footprint, suitable for large-scale problems
I Avoid expensive and challenging Hessian computation
I May take many nonlinear iterations and require many function andgradient evaluations
32 / 40
Switching Solvers in TAOAlgorithms
$ . / r o s e n b r o c k 1 −t a o m o n i t o r −t a o t y p e bncg −t a o b n c g t y p e s s m l b f g s$ . / r o s e n b r o c k 1 −t a o m o n i t o r −t a o t y p e b q n l s −t a o b q n l s m a t l m v m n u m v e c s 5$ . / r o s e n b r o c k 1 −t a o m o n i t o r −t a o t y p e b n l s −t a o b n k p c t y p e i c c
I Switch solvers with-tao type <bncg, bqnls, bnls, ...>
I Switch CG types with-tao bncg type <gd, fr, hz, ssml bfgs, ...>
I Control limited-memory quasi-Newton history size with-tao bqnls mat lmvm num vecs N
I Change Newton-Krylov preconditioner type with-tao bnk pc type <ilu, icc, lmvm, ...>
I See the PETSc/TAO manual pages online for complete list ofalgorithm options
33 / 40
Performance Profiles: First-orderAlgorithms
100 101
π
0.0
0.2
0.4
0.6
0.8
1.0
Pc(r
p,c≤
π:c∈C)
L-BFGS
SSML-BFGS
Steepest Desc.
34 / 40
Performance Profiles: Second-orderAlgorithms
100 101
π
0.0
0.2
0.4
0.6
0.8
1.0
Pc(r
p,c≤
π:c∈C)
BNLS
BNTR
BNTL
35 / 40
Performance Profiles: VersusAlgorithms
100 101 102
π
0.0
0.2
0.4
0.6
0.8
1.0
Pc(r
p,c≤
π:c∈C)
BNLS
L-BFGS
SSML-BFGS
36 / 40
Performance Profiles: VersusAlgorithms
‘
10−2 10−1 100 101
CPU time (s)
0.0
0.2
0.4
0.6
0.8
1.0
%ofproblemssolved
BNLS
L-BFGS
SSML-BFGS
37 / 40
TAO Example: The Obstacle ProblemAlgorithms
minu
∫Ω
|∇u|2dx
subject to u(x) ≥ Φ(x) ∀ x ∈ Ω
u(x) = 0 ∀ x ∈ dΩ
I Models stretching a weightlessmembrane over a rigid obstacledefined by Φ(x)
I Implemented with MFEM andsolved with PETSc/TAO
I Online workbook: https:
//xsdk-project.github.io/
ATPESC2018HandsOnLessons/
lessons/obstacle_tao/
38 / 40
Reduced-Space ApproachSimulation Constraints
minx,u
f (x , u)
subject to R(x , u) = 0
I u ∈ <m are state variables
I R : <n ×<m → <m are the state equations that represent thesimulation constraint (e.g.: a discretized partial differential equation)
I Invoke implicit function theorem to eliminate the simulationconstraint
minx
f (x , u(x))
where u(x) is defined by solving R(x , u) = 0 for a given x
39 / 40
Evaluating DerivativesSimulation Constraints
minx
f (x , u(x))
I Evaluating f (xk , u(xk)) requires solving R(xk , uk) = 0 for uk ≡ u(xk)
I The gradient ∇f (x , u) is a total derivative
∇f (x , u) ≡ df
dx=∂f
∂x+∂f
∂u
du
dx
where ∂()/∂x and ∂()/∂u denote partial derivatives
I This total derivative can be efficiently computed using theadjoint method
40 / 40