50
An introduction to optimization on manifolds min βˆˆβ„³ ( ) Nicolas Boumal, July 2020 Mathematics @ EPFL

Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

An introduction tooptimization on manifolds

minπ‘₯π‘₯βˆˆβ„³

𝑓𝑓(π‘₯π‘₯)

Nicolas Boumal, July 2020Mathematics @ EPFL

Page 2: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Optimization on smooth manifoldsminπ‘₯π‘₯𝑓𝑓 π‘₯π‘₯ subject to π‘₯π‘₯ ∈ β„³

Linear spacesUnconstrained; linear equality constraintsLow rank (matrices, tensors)Recommender systems, large-scale Lyapunov equations, …Orthonormality (Grassmann, Stiefel, rotations)Dictionary learning, SfM, SLAM, PCA, ICA, SBM, Electr. Struct. Comp.…Positivity (positive definiteness, positive orthant)Metric learning, Gaussian mixtures, diffusion tensor imaging, …Symmetry (quotient manifolds)Invariance under group actions

2

Page 3: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

A Riemannian structure gives us gradients and Hessians

The essential tools of smooth optimization are defined generally on Riemannian manifolds.

Unified theory, broadly applicable algorithms.

First ideas from the β€˜70s.First practical in the β€˜90s.

3

Page 4: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on
Page 5: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

5

Page 6: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

6

Page 7: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

7

Page 8: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

8

Page 9: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

9

Page 10: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

10

Page 11: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

manopt.org

With Bamdev Mishra,P.-A. Absil & R. Sepulchre

11

Lead by J. Townsend,N. Koep & S. Weichwald

Lead by Ronny Bergmann

Page 12: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

manopt.org

With Bamdev Mishra,P.-A. Absil & R. Sepulchre

12

Lead by J. Townsend,N. Koep & S. Weichwald

Lead by Ronny Bergmann

nicolasboumal.net/book

Page 13: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

What is a smooth manifold?

13

Yes

Yes

No

This overview: restricted to embedded submanifolds of linear spaces.

Page 14: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

π‘₯π‘₯

𝑣𝑣

β„³ βŠ† β„° is an embedded submanifold of codimension π‘˜π‘˜ if:There exists a smooth function β„Ž:β„° β†’ π‘π‘π‘˜π‘˜ such that:

β„³ ≑ β„Ž π‘₯π‘₯ = 0 and Dβ„Ž π‘₯π‘₯ :β„° β†’ π‘π‘π‘˜π‘˜ has rank π‘˜π‘˜ βˆ€π‘₯π‘₯ ∈ β„³

E.g., sphere: β„Ž π‘₯π‘₯ = π‘₯π‘₯𝑇𝑇π‘₯π‘₯ βˆ’ 1 and Dβ„Ž π‘₯π‘₯ 𝑣𝑣 = 2π‘₯π‘₯𝑇𝑇𝑣𝑣

These properties allow us to linearize the set:β„Ž π‘₯π‘₯ + 𝑑𝑑𝑣𝑣 = β„Ž π‘₯π‘₯ + 𝑑𝑑Dβ„Ž π‘₯π‘₯ 𝑣𝑣 + 𝑂𝑂 𝑑𝑑2

This is 𝑂𝑂 𝑑𝑑2 iff 𝑣𝑣 ∈ ker Dβ„Ž π‘₯π‘₯ .β†’Tangent spaces: Tπ‘₯π‘₯β„³ = ker Dβ„Ž π‘₯π‘₯ ,

(subspace of β„°)

What is a smooth manifold? (2)

14

Page 15: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

β„³ βŠ† β„° is an embedded submanifold of codimension π‘˜π‘˜ if:

For each π‘₯π‘₯ ∈ β„³ there is a neighborhood π‘ˆπ‘ˆ of π‘₯π‘₯ in β„°and a smooth function β„Ž:π‘ˆπ‘ˆ β†’ π‘π‘π‘˜π‘˜ such that:

a) Dβ„Ž π‘₯π‘₯ :β„° β†’ π‘π‘π‘˜π‘˜ has rank π‘˜π‘˜, and

b) β„³βˆ©π‘ˆπ‘ˆ = β„Žβˆ’1 0 = 𝑦𝑦 ∈ π‘ˆπ‘ˆ:β„Ž 𝑦𝑦 = 0

(Necessary for fixed-rank matrices.)

What is a smooth manifold? (3)

15

Page 16: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Bootstrapping our set of tools

1. Smooth maps2. Differentials of smooth maps3. Vector fields and tangent bundles4. Retractions5. Riemannian metrics6. Riemannian gradients7. Riemannian connections8. Riemannian Hessians9. Riemannian covariant derivatives along curves

16

Page 17: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Smooth maps between manifolds

With β„³,β„³β€² embedded in β„°,β„°β€²,

Define: a map 𝐹𝐹:β„³ β†’β„³β€² is smooth if:

There exists a neighborhood π‘ˆπ‘ˆ of β„³ in β„° and a smooth map �𝐹𝐹:π‘ˆπ‘ˆ β†’ β„°β€² such that 𝐹𝐹 = �𝐹𝐹|β„³ .

We call �𝐹𝐹 a smooth extension of 𝐹𝐹.

17

Page 18: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Differential of a smooth map

18

For �𝐹𝐹:β„° β†’ β„°β€², we have D �𝐹𝐹 π‘₯π‘₯ 𝑣𝑣 = lim𝑑𝑑→0

�𝐹𝐹 π‘₯π‘₯+𝑑𝑑𝑑𝑑 βˆ’ �𝐹𝐹 π‘₯π‘₯𝑑𝑑

.

For 𝐹𝐹:β„³ β†’β„³β€² smooth, 𝐹𝐹 π‘₯π‘₯ + 𝑑𝑑𝑣𝑣 may not be defined.But picking a smooth extension �𝐹𝐹, for 𝑣𝑣 ∈ Tπ‘₯π‘₯β„³ we say:

Define: D𝐹𝐹 π‘₯π‘₯ 𝑣𝑣 = lim𝑑𝑑→0

�𝐹𝐹 π‘₯π‘₯+𝑑𝑑𝑑𝑑 βˆ’ �𝐹𝐹 π‘₯π‘₯𝑑𝑑

= D �𝐹𝐹 π‘₯π‘₯ 𝑣𝑣 .

Claim: this does not depend on choice of �𝐹𝐹.Claim: we retain linearity, product rule & chain rule.

Page 19: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Differential of a smooth map (2)

19

We defined D𝐹𝐹 π‘₯π‘₯ = D �𝐹𝐹 π‘₯π‘₯ |Tπ‘₯π‘₯β„³ . Equivalently:

Claim: for each 𝑣𝑣 ∈ Tπ‘₯π‘₯β„³, there exists a smooth curve 𝑐𝑐:𝐑𝐑 β†’ β„³ such that 𝑐𝑐 0 = π‘₯π‘₯ and 𝑐𝑐′ 0 = 𝑣𝑣.

Page 20: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Differential of a smooth map (2)

20

We defined D𝐹𝐹 π‘₯π‘₯ = D �𝐹𝐹 π‘₯π‘₯ |Tπ‘₯π‘₯β„³ . Equivalently:

Claim: for each 𝑣𝑣 ∈ Tπ‘₯π‘₯β„³, there exists a smooth curve 𝑐𝑐:𝐑𝐑 β†’ β„³ such that 𝑐𝑐 0 = π‘₯π‘₯ and 𝑐𝑐′ 0 = 𝑣𝑣.

Define: D𝐹𝐹 π‘₯π‘₯ 𝑣𝑣 = lim𝑑𝑑→0

𝐹𝐹 𝑐𝑐(𝑑𝑑) βˆ’πΉπΉ π‘₯π‘₯𝑑𝑑

= 𝐹𝐹 ∘ 𝑐𝑐 β€² 0 .

Page 21: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Vector fields and the tangent bundleA vector field 𝑉𝑉 ties to each π‘₯π‘₯ a vector 𝑉𝑉 π‘₯π‘₯ ∈ Tπ‘₯π‘₯β„³.

What does it mean for 𝑉𝑉 to be smooth?

Define: the tangent bundle of β„³ is the disjoint union of all tangent spaces:

Tβ„³ = π‘₯π‘₯, 𝑣𝑣 : π‘₯π‘₯ ∈ β„³ and 𝑣𝑣 ∈ Tπ‘₯π‘₯β„³

Claim: Tβ„³ is a manifold (embedded in β„° Γ— β„°).

𝑉𝑉 is smooth if it is smooth as a map from β„³ to Tβ„³.23

Page 22: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Aside: new manifolds from old ones

Given a manifold β„³, we can create a new manifold by considering its tangent bundle.

Here are other ways to recycle:

β€’ Products of manifolds: β„³ Γ— β„³β€², ℳ𝑛𝑛

β€’ Open subsets of manifolds*β€’ (Quotienting some equivalence relations.)

*For embedded submanifolds: subset topology.24

Page 23: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Retractions: moving around

Given π‘₯π‘₯, 𝑣𝑣 ∈ Tβ„³, we can move away from π‘₯π‘₯ along 𝑣𝑣 using any 𝑐𝑐:𝐑𝐑 β†’β„³ with 𝑐𝑐 0 = π‘₯π‘₯ and 𝑐𝑐′ 0 = 𝑣𝑣.

Retractions are a smooth choice of curves over Tβ„³.

Define: A retraction is a smooth map 𝑅𝑅: Tβ„³ β†’β„³such that if 𝑐𝑐 𝑑𝑑 = 𝑅𝑅 π‘₯π‘₯, 𝑑𝑑𝑣𝑣 = 𝑅𝑅π‘₯π‘₯(𝑑𝑑𝑣𝑣)then 𝑐𝑐 0 = π‘₯π‘₯ and 𝑐𝑐′ 0 = 𝑣𝑣.

25

Page 24: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Retractions: moving around (2)

Define: a retraction is a smooth map 𝑅𝑅: Tβ„³ β†’β„³such that if 𝑐𝑐 𝑑𝑑 = 𝑅𝑅 π‘₯π‘₯, 𝑑𝑑𝑣𝑣 = 𝑅𝑅π‘₯π‘₯(𝑑𝑑𝑣𝑣),then 𝑐𝑐 0 = π‘₯π‘₯ and 𝑐𝑐′ 0 = 𝑣𝑣.

Equivalently: 𝑅𝑅π‘₯π‘₯ 0 = π‘₯π‘₯ and D𝑅𝑅π‘₯π‘₯ 0 = Id.

Typical choice: 𝑅𝑅π‘₯π‘₯ 𝑣𝑣 = projection of π‘₯π‘₯ + 𝑣𝑣 to β„³:β€’β„³ = 𝐑𝐑𝑛𝑛, 𝑅𝑅π‘₯π‘₯ 𝑣𝑣 = π‘₯π‘₯ + 𝑣𝑣‒ℳ = Sπ‘›π‘›βˆ’1, 𝑅𝑅π‘₯π‘₯ 𝑣𝑣 = π‘₯π‘₯+𝑑𝑑

π‘₯π‘₯+𝑑𝑑‒ℳ = π‘π‘π‘Ÿπ‘Ÿπ‘šπ‘šΓ—π‘›π‘›, 𝑅𝑅𝑋𝑋 𝑉𝑉 = SVDπ‘Ÿπ‘Ÿ 𝑋𝑋 + 𝑉𝑉

26

Page 25: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Towards gradients: reminders from 𝐑𝐑𝑛𝑛

The gradient of a smooth 𝑓𝑓:𝐑𝐑𝑛𝑛 β†’ 𝐑𝐑 at π‘₯π‘₯ is defined by:

D 𝑓𝑓 π‘₯π‘₯ 𝑣𝑣 = grad 𝑓𝑓 π‘₯π‘₯ , 𝑣𝑣 for all 𝑣𝑣 ∈ 𝐑𝐑𝑛𝑛.

In particular, with the inner product 𝑒𝑒, 𝑣𝑣 = π‘’π‘’βŠ€π‘£π‘£,

grad 𝑓𝑓 π‘₯π‘₯ 𝑖𝑖 = grad 𝑓𝑓 π‘₯π‘₯ , 𝑒𝑒𝑖𝑖 = D 𝑓𝑓 π‘₯π‘₯ [𝑒𝑒𝑖𝑖] =πœ•πœ• π‘“π‘“πœ•πœ•π‘₯π‘₯𝑖𝑖

π‘₯π‘₯ .

Note: grad 𝑓𝑓 is a smooth vector field on 𝐑𝐑𝑛𝑛.27

Page 26: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Riemannian metrics

Tπ‘₯π‘₯β„³ is a linear space (subspace of β„°).

Pick an inner product β‹…,β‹… π‘₯π‘₯ for each Tπ‘₯π‘₯β„³.

Define: β‹…,β‹… π‘₯π‘₯ defines a Riemannian metric on β„³ if for any two smooth vector fields 𝑉𝑉,π‘Šπ‘Š the function π‘₯π‘₯ ↦ 𝑉𝑉(π‘₯π‘₯),π‘Šπ‘Š(π‘₯π‘₯) π‘₯π‘₯ is smooth.

A Riemannian manifold is a manifold with a Riemannian metric.

28

Page 27: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Riemannian submanifolds

Let β‹…,β‹… be the inner product on β„°.

Since Tπ‘₯π‘₯β„³ is a linear subspace of β„°, one choice is:

𝑒𝑒, 𝑣𝑣 π‘₯π‘₯ = 𝑒𝑒, 𝑣𝑣

Claim: this defines a Riemannian metric on β„³.

With this metric, β„³ is a Riemannian submanifold of β„°.

29

Page 28: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Riemannian gradient

Let 𝑓𝑓:β„³ β†’ 𝐑𝐑 be smooth on a Riemannian manifold.

Define: the Riemannian gradient of 𝑓𝑓 at π‘₯π‘₯ is the unique tangent vector at π‘₯π‘₯ such that:

D𝑓𝑓 π‘₯π‘₯ 𝑣𝑣 = grad𝑓𝑓 π‘₯π‘₯ , 𝑣𝑣 π‘₯π‘₯ for all 𝑣𝑣 ∈ Tπ‘₯π‘₯β„³

Claim: grad𝑓𝑓 is a smooth vector field.Claim: if π‘₯π‘₯ is a local optimum of 𝑓𝑓, grad𝑓𝑓 π‘₯π‘₯ = 0.

30

Page 29: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Gradients on Riemannian submanifoldsLet 𝑓𝑓 be a smooth extension of 𝑓𝑓. For all 𝑣𝑣 ∈ Tπ‘₯π‘₯β„³:

grad𝑓𝑓 π‘₯π‘₯ ,𝑣𝑣 π‘₯π‘₯ = D𝑓𝑓 π‘₯π‘₯ 𝑣𝑣= D 𝑓𝑓 π‘₯π‘₯ 𝑣𝑣 = grad 𝑓𝑓 π‘₯π‘₯ ,𝑣𝑣

Assume β„³ is a Riemannian submanifold of β„°.Since β‹…,β‹… = β‹…,β‹… π‘₯π‘₯, by uniqueness we conclude:

grad𝑓𝑓 π‘₯π‘₯ = Projπ‘₯π‘₯ grad 𝑓𝑓 π‘₯π‘₯

Projπ‘₯π‘₯ is the orthogonal projector from β„° to Tπ‘₯π‘₯β„³.31

Page 30: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

A first algorithm: gradient descent

For 𝑓𝑓:𝐑𝐑𝑛𝑛 β†’ 𝐑𝐑, π‘₯π‘₯π‘˜π‘˜+1 = π‘₯π‘₯π‘˜π‘˜ βˆ’ π›Όπ›Όπ‘˜π‘˜grad 𝑓𝑓 π‘₯π‘₯π‘˜π‘˜

For 𝑓𝑓:β„³ β†’ 𝐑𝐑, π‘₯π‘₯π‘˜π‘˜+1 = 𝑅𝑅π‘₯π‘₯π‘˜π‘˜ βˆ’π›Όπ›Όπ‘˜π‘˜grad𝑓𝑓 π‘₯π‘₯π‘˜π‘˜

For the analysis, need to understand 𝑓𝑓 π‘₯π‘₯π‘˜π‘˜+1 :

The composition 𝑓𝑓 ∘ 𝑅𝑅π‘₯π‘₯: Tπ‘₯π‘₯β„³ β†’ 𝐑𝐑 is on a linear space, hence we may Taylor expand it.

32

Page 31: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

A first algorithm: gradient descentπ‘₯π‘₯π‘˜π‘˜+1 = 𝑅𝑅π‘₯π‘₯π‘˜π‘˜ βˆ’π›Όπ›Όπ‘˜π‘˜grad𝑓𝑓 π‘₯π‘₯π‘˜π‘˜

The composition 𝑓𝑓 ∘ 𝑅𝑅π‘₯π‘₯: Tπ‘₯π‘₯β„³ β†’ 𝐑𝐑 is on a linear space, hence we may Taylor expand it:

𝑓𝑓 𝑅𝑅π‘₯π‘₯ 𝑣𝑣 = 𝑓𝑓 𝑅𝑅π‘₯π‘₯ 0 + grad 𝑓𝑓 ∘ 𝑅𝑅π‘₯π‘₯ 0 , 𝑣𝑣 π‘₯π‘₯ + 𝑂𝑂 𝑣𝑣 π‘₯π‘₯2

= 𝑓𝑓 π‘₯π‘₯ + grad𝑓𝑓 π‘₯π‘₯ , 𝑣𝑣 π‘₯π‘₯ + 𝑂𝑂 𝑣𝑣 π‘₯π‘₯2

Indeed: D 𝑓𝑓 ∘ 𝑅𝑅π‘₯π‘₯ 0 𝑣𝑣 = D𝑓𝑓 𝑅𝑅π‘₯π‘₯ 0 D𝑅𝑅π‘₯π‘₯ 0 𝑣𝑣 = D𝑓𝑓 π‘₯π‘₯ 𝑣𝑣 .

33

Page 32: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Gradient descent on β„³A1 𝑓𝑓 π‘₯π‘₯ β‰₯ 𝑓𝑓low for all π‘₯π‘₯ ∈ β„³A2 𝑓𝑓 𝑅𝑅π‘₯π‘₯ 𝑣𝑣 ≀ 𝑓𝑓 π‘₯π‘₯ + 𝑣𝑣, grad𝑓𝑓 π‘₯π‘₯ + 𝐿𝐿

2𝑣𝑣 2

Algorithm: π‘₯π‘₯π‘˜π‘˜+1 = 𝑅𝑅π‘₯π‘₯π‘˜π‘˜ βˆ’ 1𝐿𝐿

grad𝑓𝑓(π‘₯π‘₯π‘˜π‘˜)

Complexity: grad𝑓𝑓(π‘₯π‘₯π‘˜π‘˜) ≀ πœ€πœ€ with some π‘˜π‘˜ ≀ 2𝐿𝐿 𝑓𝑓 π‘₯π‘₯0 βˆ’ 𝑓𝑓low1πœ€πœ€2

A2 β‡’ 𝑓𝑓 π‘₯π‘₯π‘˜π‘˜+1 ≀ 𝑓𝑓 π‘₯π‘₯π‘˜π‘˜ βˆ’1𝐿𝐿

grad𝑓𝑓 π‘₯π‘₯π‘˜π‘˜ 2 +12𝐿𝐿

grad𝑓𝑓(π‘₯π‘₯π‘˜π‘˜) 2

β‡’ 𝑓𝑓 π‘₯π‘₯π‘˜π‘˜ βˆ’ 𝑓𝑓 π‘₯π‘₯π‘˜π‘˜+1 β‰₯12𝐿𝐿

grad𝑓𝑓 π‘₯π‘₯π‘˜π‘˜ 2

𝐀𝐀𝐀𝐀 β‡’ 𝑓𝑓 π‘₯π‘₯0 βˆ’ 𝑓𝑓low β‰₯ βˆ‘π‘˜π‘˜=0πΎπΎβˆ’1 𝑓𝑓 π‘₯π‘₯π‘˜π‘˜ βˆ’ 𝑓𝑓 π‘₯π‘₯π‘˜π‘˜+1 > πœ€πœ€2

2𝐿𝐿𝐾𝐾

34

π‘₯π‘₯

𝑅𝑅π‘₯π‘₯ 𝑣𝑣𝑣𝑣

(for contradiction)

Page 33: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Tips and tricks to get the gradient

Use definition as starting point: D𝑓𝑓 π‘₯π‘₯ 𝑣𝑣 = β‹―

Cheap gradient principle

Numerical check of gradient: Taylor 𝑑𝑑 ↦ 𝑓𝑓 𝑅𝑅π‘₯π‘₯ 𝑑𝑑𝑣𝑣Manopt: checkgradient(problem)

Automatic differentiation: Python, JuliaNot Matlab :/β€”this being said, for theory, often need to manipulate gradient β€œon paper” anyway.

35

Page 34: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

On to second-order methods

Consider 𝑓𝑓:𝐑𝐑𝑛𝑛 β†’ 𝐑𝐑 smooth. Taylor says:

𝑓𝑓 π‘₯π‘₯ + 𝑣𝑣 β‰ˆ 𝑓𝑓 π‘₯π‘₯ + grad 𝑓𝑓 π‘₯π‘₯ ,𝑣𝑣 +12𝑣𝑣, Hess 𝑓𝑓 π‘₯π‘₯ 𝑣𝑣

If Hess 𝑓𝑓 π‘₯π‘₯ ≻ 0, quadratic model minimized for 𝑣𝑣 s.t.:

Hess 𝑓𝑓 π‘₯π‘₯ 𝑣𝑣 = βˆ’grad 𝑓𝑓 π‘₯π‘₯

From there, we can construct Newton’s method etc.36

Page 35: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Towards Hessians: reminders from 𝐑𝐑𝑛𝑛

Consider 𝑓𝑓:𝐑𝐑𝑛𝑛 β†’ 𝐑𝐑 smooth.

The Hessian of 𝑓𝑓 at π‘₯π‘₯ is a linear operator which tells us how the gradient vector field of 𝑓𝑓 varies:

Hess 𝑓𝑓 π‘₯π‘₯ 𝑣𝑣 = Dgrad 𝑓𝑓 π‘₯π‘₯ 𝑣𝑣

With 𝑒𝑒, 𝑣𝑣 = π‘’π‘’βŠ€π‘£π‘£, yields: Hess 𝑓𝑓 π‘₯π‘₯ 𝑖𝑖𝑖𝑖 = πœ•πœ•2 π‘“π‘“πœ•πœ•π‘₯π‘₯π‘–π‘–πœ•πœ•π‘₯π‘₯𝑗𝑗

(π‘₯π‘₯).

Notice that Hess 𝑓𝑓 π‘₯π‘₯ 𝑣𝑣 is a vector in 𝐑𝐑𝑛𝑛.

37

Page 36: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

A difficulty on manifolds

A smooth vector field 𝑉𝑉 on β„³ is a smooth map:we already have a notion of how to differentiate it.

Example: with 𝑓𝑓 π‘₯π‘₯ = 12π‘₯π‘₯⊀𝐴𝐴π‘₯π‘₯ on the sphere,

𝑉𝑉 π‘₯π‘₯ = grad𝑓𝑓 π‘₯π‘₯ = Projπ‘₯π‘₯ 𝐴𝐴π‘₯π‘₯ = 𝐴𝐴π‘₯π‘₯ βˆ’ π‘₯π‘₯⊀𝐴𝐴π‘₯π‘₯ π‘₯π‘₯

D𝑉𝑉 π‘₯π‘₯ 𝑒𝑒 = β‹― = Projπ‘₯π‘₯ 𝐴𝐴𝑒𝑒 βˆ’ π‘₯π‘₯⊀𝐴𝐴π‘₯π‘₯ 𝑒𝑒 βˆ’ π‘₯π‘₯βŠ€π΄π΄π‘’π‘’ π‘₯π‘₯

Issue: D𝑉𝑉 π‘₯π‘₯ 𝑒𝑒 may not be a tangent vector at π‘₯π‘₯!

38

Page 37: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Connections:A tool to differentiate vector fieldsLet 𝒳𝒳 β„³ be the set of smooth vector fields on β„³.Given π‘ˆπ‘ˆ ∈ 𝒳𝒳 β„³ and smooth 𝑓𝑓, π‘ˆπ‘ˆπ‘“π‘“ π‘₯π‘₯ = D𝑓𝑓 π‘₯π‘₯ [π‘ˆπ‘ˆ(π‘₯π‘₯)].

A map 𝛻𝛻:𝒳𝒳 β„³ Γ— 𝒳𝒳 β„³ β†’ 𝒳𝒳 β„³ is a connection if:1. 𝛻𝛻𝑓𝑓𝑓𝑓+𝑔𝑔𝑔𝑔 𝑉𝑉 = 𝑓𝑓𝛻𝛻𝑓𝑓𝑉𝑉 + 𝑔𝑔𝛻𝛻𝑔𝑔𝑉𝑉2. 𝛻𝛻𝑓𝑓 π‘Žπ‘Žπ‘‰π‘‰ + π‘π‘π‘Šπ‘Š = π‘Žπ‘Žπ›»π›»π‘“π‘“π‘‰π‘‰ + π‘π‘π›»π›»π‘“π‘“π‘Šπ‘Š3. 𝛻𝛻𝑓𝑓 𝑓𝑓𝑉𝑉 = π‘ˆπ‘ˆπ‘“π‘“ 𝑉𝑉 + 𝑓𝑓𝛻𝛻𝑓𝑓𝑉𝑉

Example: for β„³ = β„°, 𝛻𝛻𝑓𝑓𝑉𝑉 π‘₯π‘₯ = D𝑉𝑉 π‘₯π‘₯ π‘ˆπ‘ˆ π‘₯π‘₯Example: for β„³ βŠ† β„°, 𝛻𝛻𝑓𝑓𝑉𝑉 π‘₯π‘₯ = Projπ‘₯π‘₯ D𝑉𝑉 π‘₯π‘₯ π‘ˆπ‘ˆ π‘₯π‘₯

39

Page 38: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Riemannian connections:A unique and favorable choiceLet β„³ be a Riemannian manifold.Claim: there exists a unique connection 𝛻𝛻 on β„³ s.t.:

4. 𝛻𝛻𝑓𝑓𝑉𝑉 βˆ’ π›»π›»π‘‰π‘‰π‘ˆπ‘ˆ 𝑓𝑓 = π‘ˆπ‘ˆ 𝑉𝑉𝑓𝑓 βˆ’ 𝑉𝑉 π‘ˆπ‘ˆπ‘“π‘“5. π‘ˆπ‘ˆ 𝑉𝑉,π‘Šπ‘Š = 𝛻𝛻𝑓𝑓𝑉𝑉,π‘Šπ‘Š + 𝑉𝑉,π›»π›»π‘“π‘“π‘Šπ‘Š

It is called the Riemannian connection (Levi-Civita).

Claim: if β„³ is a Riemannian submanifold of β„°, then

𝛻𝛻𝑓𝑓𝑉𝑉 π‘₯π‘₯ = Projπ‘₯π‘₯ D𝑉𝑉 π‘₯π‘₯ π‘ˆπ‘ˆ π‘₯π‘₯

is the Riemannian connection on β„³.40

Page 39: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Riemannian HessiansClaim: 𝛻𝛻𝑓𝑓𝑉𝑉 π‘₯π‘₯ depends on π‘ˆπ‘ˆ only through π‘ˆπ‘ˆ(π‘₯π‘₯).This justifies the notation 𝛻𝛻𝑒𝑒𝑉𝑉; e.g.: 𝛻𝛻𝑒𝑒𝑉𝑉 = Projπ‘₯π‘₯ D𝑉𝑉 π‘₯π‘₯ 𝑒𝑒

Define: the Riemannian Hessian of 𝑓𝑓:β„³ β†’ 𝐑𝐑 at π‘₯π‘₯is a linear operator from Tπ‘₯π‘₯β„³ to Tπ‘₯π‘₯β„³ defined by:

Hess𝑓𝑓 π‘₯π‘₯ 𝑒𝑒 = 𝛻𝛻𝑒𝑒grad𝑓𝑓

where 𝛻𝛻 is the Riemannian connection.

Claim: Hess𝑓𝑓 π‘₯π‘₯ is self-adjoint.Claim: if π‘₯π‘₯ is a local minimum, then Hess𝑓𝑓 π‘₯π‘₯ ≽ 0.

41

Page 40: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Hessians on Riemannian submanifolds

Hess𝑓𝑓 π‘₯π‘₯ 𝑒𝑒 = 𝛻𝛻𝑒𝑒grad𝑓𝑓(π‘₯π‘₯)

On a Riemannian submanifold of a linear space,

𝛻𝛻𝑒𝑒𝑉𝑉 = Projπ‘₯π‘₯ D𝑉𝑉 π‘₯π‘₯ 𝑒𝑒

Combining:

Hess𝑓𝑓 π‘₯π‘₯ 𝑒𝑒 = Projπ‘₯π‘₯ Dgrad𝑓𝑓 π‘₯π‘₯ 𝑒𝑒

42

Page 41: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Hessians on Riemannian submanifolds (2)

Hess𝑓𝑓 π‘₯π‘₯ 𝑒𝑒 = Projπ‘₯π‘₯ Dgrad𝑓𝑓 π‘₯π‘₯ 𝑒𝑒

Example: 𝑓𝑓 π‘₯π‘₯ = 12π‘₯π‘₯⊀𝐴𝐴π‘₯π‘₯ on the sphere in 𝐑𝐑𝑛𝑛.

𝑉𝑉 π‘₯π‘₯ = grad𝑓𝑓 π‘₯π‘₯ = Projπ‘₯π‘₯ 𝐴𝐴π‘₯π‘₯ = 𝐴𝐴π‘₯π‘₯ βˆ’ π‘₯π‘₯⊀𝐴𝐴π‘₯π‘₯ π‘₯π‘₯

D𝑉𝑉 π‘₯π‘₯ 𝑒𝑒 = Projπ‘₯π‘₯ 𝐴𝐴𝑒𝑒 βˆ’ π‘₯π‘₯⊀𝐴𝐴π‘₯π‘₯ 𝑒𝑒 βˆ’ π‘₯π‘₯βŠ€π΄π΄π‘’π‘’ π‘₯π‘₯

Hess𝑓𝑓 π‘₯π‘₯ 𝑒𝑒 = Projπ‘₯π‘₯ 𝐴𝐴𝑒𝑒 βˆ’ π‘₯π‘₯⊀𝐴𝐴π‘₯π‘₯ 𝑒𝑒

Remarkably, grad𝑓𝑓 π‘₯π‘₯ = 0 and Hess𝑓𝑓 π‘₯π‘₯ ≽ 0 iff π‘₯π‘₯ optimal.

43

Page 42: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Newton, Taylor and RiemannNow that we have a Hessian, we might guess:

𝑓𝑓 𝑅𝑅π‘₯π‘₯ 𝑣𝑣 β‰ˆ π‘šπ‘šπ‘₯π‘₯ 𝑣𝑣 = 𝑓𝑓 π‘₯π‘₯ + grad𝑓𝑓 π‘₯π‘₯ , 𝑣𝑣 π‘₯π‘₯ +12𝑣𝑣, Hess𝑓𝑓 π‘₯π‘₯ 𝑣𝑣 π‘₯π‘₯

If Hess𝑓𝑓 π‘₯π‘₯ is invertible, π‘šπ‘šπ‘₯π‘₯: Tπ‘₯π‘₯β„³ β†’ 𝐑𝐑 has one critical point, solution of this linear system:

Hess𝑓𝑓 π‘₯π‘₯ 𝑣𝑣 = βˆ’grad𝑓𝑓 π‘₯π‘₯ for 𝑣𝑣 ∈ Tπ‘₯π‘₯β„³

Newton’s method on β„³: π‘₯π‘₯next = 𝑅𝑅π‘₯π‘₯ 𝑣𝑣 .

Claim: if Hess𝑓𝑓 π‘₯π‘₯βˆ— ≻ 0, quadratic local convergence.44

Page 43: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

We need one more tool…

The truncated expansion

𝑓𝑓 𝑅𝑅π‘₯π‘₯ 𝑣𝑣 β‰ˆ 𝑓𝑓 π‘₯π‘₯ + grad𝑓𝑓 π‘₯π‘₯ ,𝑣𝑣 π‘₯π‘₯ +12𝑣𝑣, Hess𝑓𝑓 π‘₯π‘₯ 𝑣𝑣 π‘₯π‘₯

is not quite correct (well, not always…)

To see why, let’s expand 𝑓𝑓 along some smooth curve.

45

Page 44: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

We need one more tool… (2)Let’s expand 𝑓𝑓 along some smooth curve.

With 𝑐𝑐:𝐑𝐑 β†’β„³ s.t. 𝑐𝑐 0 = π‘₯π‘₯ and 𝑐𝑐′ 0 = 𝑣𝑣, the composition 𝑔𝑔 = 𝑓𝑓 ∘ 𝑐𝑐 maps 𝐑𝐑 β†’ 𝐑𝐑, so Taylor holds:

𝑔𝑔 𝑑𝑑 β‰ˆ 𝑔𝑔 0 + 𝑑𝑑𝑔𝑔′ 0 +𝑑𝑑2

2𝑔𝑔′′ 0

𝑔𝑔 0 = 𝑓𝑓 π‘₯π‘₯

𝑔𝑔′ 𝑑𝑑 = D𝑓𝑓 𝑐𝑐 𝑑𝑑 𝑐𝑐′ 𝑑𝑑 = grad𝑓𝑓 𝑐𝑐 𝑑𝑑 , 𝑐𝑐′ 𝑑𝑑 𝑐𝑐(𝑑𝑑)

𝑔𝑔′ 0 = grad𝑓𝑓 π‘₯π‘₯ , 𝑣𝑣 π‘₯π‘₯

𝑔𝑔′′ 0 = ? ? ?46

Page 45: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Differentiating vector fields along curves

A map 𝑍𝑍:𝐑𝐑 β†’ Tβ„³ is a vector field along 𝑐𝑐:𝐑𝐑 β†’β„³ if 𝑍𝑍(𝑑𝑑)is a tangent vector at 𝑐𝑐(𝑑𝑑).Let 𝒳𝒳 𝑐𝑐 denote the set of smooth such fields.

Claim: there exists a unique operator Dd𝑑𝑑

:𝒳𝒳 𝑐𝑐 β†’ 𝒳𝒳 𝑐𝑐 s.t.1. D

dπ‘‘π‘‘π‘Žπ‘Žπ‘Žπ‘Ž + 𝑏𝑏𝑍𝑍 = π‘Žπ‘Ž D

dπ‘‘π‘‘π‘Žπ‘Ž + 𝑏𝑏 D

d𝑑𝑑𝑍𝑍

2. Dd𝑑𝑑

𝑔𝑔𝑍𝑍 = 𝑔𝑔′𝑍𝑍 + 𝑔𝑔 Dd𝑑𝑑𝑍𝑍

3. Dd𝑑𝑑

π‘ˆπ‘ˆ 𝑐𝑐(𝑑𝑑) = 𝛻𝛻𝑐𝑐′(𝑑𝑑)π‘ˆπ‘ˆ

4. dd𝑑𝑑

π‘Žπ‘Ž 𝑑𝑑 ,𝑍𝑍 𝑑𝑑 𝑐𝑐(𝑑𝑑) = Ddπ‘‘π‘‘π‘Žπ‘Ž 𝑑𝑑 ,𝑍𝑍

𝑐𝑐(𝑑𝑑)+ π‘Žπ‘Ž 𝑑𝑑 , D

d𝑑𝑑𝑍𝑍 𝑑𝑑

𝑐𝑐(𝑑𝑑)

where 𝛻𝛻 is the Riemannian connection.47

Page 46: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Differentiating fields along curves (2)

Claim: there exists a unique operator Dd𝑑𝑑

:𝒳𝒳 𝑐𝑐 β†’ 𝒳𝒳 𝑐𝑐 s.t.1. D

dπ‘‘π‘‘π‘Žπ‘Žπ‘Žπ‘Ž + 𝑏𝑏𝑍𝑍 = π‘Žπ‘Ž D

dπ‘‘π‘‘π‘Žπ‘Ž + 𝑏𝑏 D

d𝑑𝑑𝑍𝑍

2. Dd𝑑𝑑

𝑔𝑔𝑍𝑍 = 𝑔𝑔′𝑍𝑍 + 𝑔𝑔 Dd𝑑𝑑𝑍𝑍

3. Dd𝑑𝑑

π‘ˆπ‘ˆ 𝑐𝑐(𝑑𝑑) = 𝛻𝛻𝑐𝑐′(𝑑𝑑)π‘ˆπ‘ˆ

4. dd𝑑𝑑

π‘Žπ‘Ž 𝑑𝑑 ,𝑍𝑍 𝑑𝑑 𝑐𝑐(𝑑𝑑) = Ddπ‘‘π‘‘π‘Žπ‘Ž 𝑑𝑑 ,𝑍𝑍

𝑐𝑐(𝑑𝑑)+ π‘Žπ‘Ž 𝑑𝑑 , D

d𝑑𝑑𝑍𝑍 𝑑𝑑

𝑐𝑐(𝑑𝑑)

where 𝛻𝛻 is the Riemannian connection.

Claim: if β„³ is a Riemannian submanifold of β„°,Dd𝑑𝑑𝑍𝑍 𝑑𝑑 = Proj𝑐𝑐 𝑑𝑑

dd𝑑𝑑𝑍𝑍 𝑑𝑑

48

Page 47: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

With 𝑔𝑔 𝑑𝑑 = 𝑓𝑓 𝑐𝑐 𝑑𝑑 and 𝑔𝑔′ 𝑑𝑑 = grad𝑓𝑓 𝑐𝑐 𝑑𝑑 , 𝑐𝑐′ 𝑑𝑑 𝑐𝑐(𝑑𝑑):

𝑔𝑔′′ 𝑑𝑑 =Dd𝑑𝑑

grad𝑓𝑓 𝑐𝑐 𝑑𝑑 , 𝑐𝑐′ 𝑑𝑑𝑐𝑐(𝑑𝑑)

+ grad𝑓𝑓 𝑐𝑐 𝑑𝑑 ,Dd𝑑𝑑𝑐𝑐′ 𝑑𝑑

𝑐𝑐(𝑑𝑑)

= 𝛻𝛻𝑐𝑐′ 𝑑𝑑 grad𝑓𝑓, 𝑐𝑐′ 𝑑𝑑𝑐𝑐(𝑑𝑑)

+ grad𝑓𝑓 𝑐𝑐 𝑑𝑑 , 𝑐𝑐′′(𝑑𝑑) 𝑐𝑐(𝑑𝑑)

𝑔𝑔′′(0) = Hess𝑓𝑓 π‘₯π‘₯ 𝑣𝑣 , 𝑣𝑣 π‘₯π‘₯ + grad𝑓𝑓 π‘₯π‘₯ , 𝑐𝑐′′(0) π‘₯π‘₯

49

1. Dd𝑑𝑑

π‘Žπ‘Žπ‘Žπ‘Ž + 𝑏𝑏𝑍𝑍 = π‘Žπ‘Ž Ddπ‘‘π‘‘π‘Žπ‘Ž + 𝑏𝑏 D

d𝑑𝑑𝑍𝑍

2. Dd𝑑𝑑

𝑔𝑔𝑍𝑍 = 𝑔𝑔′𝑍𝑍 + 𝑔𝑔 Dd𝑑𝑑𝑍𝑍

3. Dd𝑑𝑑

π‘ˆπ‘ˆ 𝑐𝑐(𝑑𝑑) = 𝛻𝛻𝑐𝑐′(𝑑𝑑)π‘ˆπ‘ˆ

4. dd𝑑𝑑

π‘Žπ‘Ž 𝑑𝑑 ,𝑍𝑍 𝑑𝑑 𝑐𝑐(𝑑𝑑) = Ddπ‘‘π‘‘π‘Žπ‘Ž 𝑑𝑑 ,𝑍𝑍

𝑐𝑐(𝑑𝑑)+ π‘Žπ‘Ž 𝑑𝑑 , D

d𝑑𝑑𝑍𝑍 𝑑𝑑

𝑐𝑐(𝑑𝑑)

Page 48: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

With 𝑔𝑔 𝑑𝑑 = 𝑓𝑓 𝑐𝑐 𝑑𝑑 and 𝑔𝑔′ 𝑑𝑑 = grad𝑓𝑓 𝑐𝑐 𝑑𝑑 , 𝑐𝑐′ 𝑑𝑑 𝑐𝑐(𝑑𝑑):

𝑔𝑔′′ 𝑑𝑑 =Dd𝑑𝑑

grad𝑓𝑓 𝑐𝑐 𝑑𝑑 , 𝑐𝑐′ 𝑑𝑑𝑐𝑐(𝑑𝑑)

+ grad𝑓𝑓 𝑐𝑐 𝑑𝑑 ,Dd𝑑𝑑𝑐𝑐′ 𝑑𝑑

𝑐𝑐(𝑑𝑑)

= 𝛻𝛻𝑐𝑐′ 𝑑𝑑 grad𝑓𝑓, 𝑐𝑐′ 𝑑𝑑𝑐𝑐(𝑑𝑑)

+ grad𝑓𝑓 𝑐𝑐 𝑑𝑑 , 𝑐𝑐′′(𝑑𝑑) 𝑐𝑐(𝑑𝑑)

𝑔𝑔′′(0) = Hess𝑓𝑓 π‘₯π‘₯ 𝑣𝑣 , 𝑣𝑣 π‘₯π‘₯ + grad𝑓𝑓 π‘₯π‘₯ , 𝑐𝑐′′(0) π‘₯π‘₯

𝑓𝑓 𝑐𝑐 𝑑𝑑 = 𝑓𝑓 π‘₯π‘₯ + 𝑑𝑑 β‹… grad𝑓𝑓 π‘₯π‘₯ , 𝑣𝑣 π‘₯π‘₯ +𝑑𝑑2

2β‹… Hess𝑓𝑓 π‘₯π‘₯ 𝑣𝑣 , 𝑣𝑣 π‘₯π‘₯

+𝑑𝑑2

2β‹… grad𝑓𝑓 π‘₯π‘₯ , 𝑐𝑐′′ 0 π‘₯π‘₯ + 𝑂𝑂 𝑑𝑑3

The annoying term vanishes at critical points and for specialcurves (special retractions). Mostly fine for optimization.

50

Page 49: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

Trust-region method:Newton’s with a safeguardWith the same tools, we can design a Riemanniantrust-region method: RTR (Absil, Baker & Gallivan ’07).

Approximately minimizes quadratic model of 𝑓𝑓 ∘ 𝑅𝑅π‘₯π‘₯π‘˜π‘˜restricted to a ball in Tπ‘₯π‘₯π‘˜π‘˜β„³, with dynamic radius.

Complexity known. Excellent performance, also withapproximate Hessian (e.g., finite differences of grad𝑓𝑓).

In Manopt, call trustregions(problem).

51

Page 50: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field 𝑉𝑉. on

In the lecture notes:β€’ Proofs for all claims in these slidesβ€’ References to the (growing) literatureβ€’ Short descriptions of applicationsβ€’ Fully worked out manifolds

E.g.: Stiefel, fixed-rank matrices, general π‘₯π‘₯ ∈ β„°:β„Ž π‘₯π‘₯ = 0β€’ Details about computation, pitfalls and tricks

E.g.: how to compute gradients, checkgradient, checkhessian

β€’ Theory for general manifoldsβ€’ Theory for quotient manifoldsβ€’ Discussion of more advanced geometric tools

E.g.: distance, exp, log, transports, Lipschitz, finite differencesβ€’ Basics of geodesic convexity

52

nicolasboumal.net/book