Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
An introduction tooptimization on manifolds
minπ₯π₯ββ³
ππ(π₯π₯)
Nicolas Boumal, July 2020Mathematics @ EPFL
Optimization on smooth manifoldsminπ₯π₯ππ π₯π₯ subject to π₯π₯ β β³
Linear spacesUnconstrained; linear equality constraintsLow rank (matrices, tensors)Recommender systems, large-scale Lyapunov equations, β¦Orthonormality (Grassmann, Stiefel, rotations)Dictionary learning, SfM, SLAM, PCA, ICA, SBM, Electr. Struct. Comp.β¦Positivity (positive definiteness, positive orthant)Metric learning, Gaussian mixtures, diffusion tensor imaging, β¦Symmetry (quotient manifolds)Invariance under group actions
2
A Riemannian structure gives us gradients and Hessians
The essential tools of smooth optimization are defined generally on Riemannian manifolds.
Unified theory, broadly applicable algorithms.
First ideas from the β70s.First practical in the β90s.
3
5
6
7
8
9
10
manopt.org
With Bamdev Mishra,P.-A. Absil & R. Sepulchre
11
Lead by J. Townsend,N. Koep & S. Weichwald
Lead by Ronny Bergmann
manopt.org
With Bamdev Mishra,P.-A. Absil & R. Sepulchre
12
Lead by J. Townsend,N. Koep & S. Weichwald
Lead by Ronny Bergmann
nicolasboumal.net/book
What is a smooth manifold?
13
Yes
Yes
No
This overview: restricted to embedded submanifolds of linear spaces.
π₯π₯
π£π£
β³ β β° is an embedded submanifold of codimension ππ if:There exists a smooth function β:β° β ππππ such that:
β³ β‘ β π₯π₯ = 0 and Dβ π₯π₯ :β° β ππππ has rank ππ βπ₯π₯ β β³
E.g., sphere: β π₯π₯ = π₯π₯πππ₯π₯ β 1 and Dβ π₯π₯ π£π£ = 2π₯π₯πππ£π£
These properties allow us to linearize the set:β π₯π₯ + π‘π‘π£π£ = β π₯π₯ + π‘π‘Dβ π₯π₯ π£π£ + ππ π‘π‘2
This is ππ π‘π‘2 iff π£π£ β ker Dβ π₯π₯ .βTangent spaces: Tπ₯π₯β³ = ker Dβ π₯π₯ ,
(subspace of β°)
What is a smooth manifold? (2)
14
β³ β β° is an embedded submanifold of codimension ππ if:
For each π₯π₯ β β³ there is a neighborhood ππ of π₯π₯ in β°and a smooth function β:ππ β ππππ such that:
a) Dβ π₯π₯ :β° β ππππ has rank ππ, and
b) β³β©ππ = ββ1 0 = π¦π¦ β ππ:β π¦π¦ = 0
(Necessary for fixed-rank matrices.)
What is a smooth manifold? (3)
15
Bootstrapping our set of tools
1. Smooth maps2. Differentials of smooth maps3. Vector fields and tangent bundles4. Retractions5. Riemannian metrics6. Riemannian gradients7. Riemannian connections8. Riemannian Hessians9. Riemannian covariant derivatives along curves
16
Smooth maps between manifolds
With β³,β³β² embedded in β°,β°β²,
Define: a map πΉπΉ:β³ ββ³β² is smooth if:
There exists a neighborhood ππ of β³ in β° and a smooth map οΏ½πΉπΉ:ππ β β°β² such that πΉπΉ = οΏ½πΉπΉ|β³ .
We call οΏ½πΉπΉ a smooth extension of πΉπΉ.
17
Differential of a smooth map
18
For οΏ½πΉπΉ:β° β β°β², we have D οΏ½πΉπΉ π₯π₯ π£π£ = limπ‘π‘β0
οΏ½πΉπΉ π₯π₯+π‘π‘π‘π‘ β οΏ½πΉπΉ π₯π₯π‘π‘
.
For πΉπΉ:β³ ββ³β² smooth, πΉπΉ π₯π₯ + π‘π‘π£π£ may not be defined.But picking a smooth extension οΏ½πΉπΉ, for π£π£ β Tπ₯π₯β³ we say:
Define: DπΉπΉ π₯π₯ π£π£ = limπ‘π‘β0
οΏ½πΉπΉ π₯π₯+π‘π‘π‘π‘ β οΏ½πΉπΉ π₯π₯π‘π‘
= D οΏ½πΉπΉ π₯π₯ π£π£ .
Claim: this does not depend on choice of οΏ½πΉπΉ.Claim: we retain linearity, product rule & chain rule.
Differential of a smooth map (2)
19
We defined DπΉπΉ π₯π₯ = D οΏ½πΉπΉ π₯π₯ |Tπ₯π₯β³ . Equivalently:
Claim: for each π£π£ β Tπ₯π₯β³, there exists a smooth curve ππ:ππ β β³ such that ππ 0 = π₯π₯ and ππβ² 0 = π£π£.
Differential of a smooth map (2)
20
We defined DπΉπΉ π₯π₯ = D οΏ½πΉπΉ π₯π₯ |Tπ₯π₯β³ . Equivalently:
Claim: for each π£π£ β Tπ₯π₯β³, there exists a smooth curve ππ:ππ β β³ such that ππ 0 = π₯π₯ and ππβ² 0 = π£π£.
Define: DπΉπΉ π₯π₯ π£π£ = limπ‘π‘β0
πΉπΉ ππ(π‘π‘) βπΉπΉ π₯π₯π‘π‘
= πΉπΉ β ππ β² 0 .
Vector fields and the tangent bundleA vector field ππ ties to each π₯π₯ a vector ππ π₯π₯ β Tπ₯π₯β³.
What does it mean for ππ to be smooth?
Define: the tangent bundle of β³ is the disjoint union of all tangent spaces:
Tβ³ = π₯π₯, π£π£ : π₯π₯ β β³ and π£π£ β Tπ₯π₯β³
Claim: Tβ³ is a manifold (embedded in β° Γ β°).
ππ is smooth if it is smooth as a map from β³ to Tβ³.23
Aside: new manifolds from old ones
Given a manifold β³, we can create a new manifold by considering its tangent bundle.
Here are other ways to recycle:
β’ Products of manifolds: β³ Γ β³β², β³ππ
β’ Open subsets of manifolds*β’ (Quotienting some equivalence relations.)
*For embedded submanifolds: subset topology.24
Retractions: moving around
Given π₯π₯, π£π£ β Tβ³, we can move away from π₯π₯ along π£π£ using any ππ:ππ ββ³ with ππ 0 = π₯π₯ and ππβ² 0 = π£π£.
Retractions are a smooth choice of curves over Tβ³.
Define: A retraction is a smooth map π π : Tβ³ ββ³such that if ππ π‘π‘ = π π π₯π₯, π‘π‘π£π£ = π π π₯π₯(π‘π‘π£π£)then ππ 0 = π₯π₯ and ππβ² 0 = π£π£.
25
Retractions: moving around (2)
Define: a retraction is a smooth map π π : Tβ³ ββ³such that if ππ π‘π‘ = π π π₯π₯, π‘π‘π£π£ = π π π₯π₯(π‘π‘π£π£),then ππ 0 = π₯π₯ and ππβ² 0 = π£π£.
Equivalently: π π π₯π₯ 0 = π₯π₯ and Dπ π π₯π₯ 0 = Id.
Typical choice: π π π₯π₯ π£π£ = projection of π₯π₯ + π£π£ to β³:β’β³ = ππππ, π π π₯π₯ π£π£ = π₯π₯ + π£π£β’β³ = Sππβ1, π π π₯π₯ π£π£ = π₯π₯+π‘π‘
π₯π₯+π‘π‘β’β³ = ππππππΓππ, π π ππ ππ = SVDππ ππ + ππ
26
Towards gradients: reminders from ππππ
The gradient of a smooth ππ:ππππ β ππ at π₯π₯ is defined by:
D ππ π₯π₯ π£π£ = grad ππ π₯π₯ , π£π£ for all π£π£ β ππππ.
In particular, with the inner product π’π’, π£π£ = π’π’β€π£π£,
grad ππ π₯π₯ ππ = grad ππ π₯π₯ , ππππ = D ππ π₯π₯ [ππππ] =ππ πππππ₯π₯ππ
π₯π₯ .
Note: grad ππ is a smooth vector field on ππππ.27
Riemannian metrics
Tπ₯π₯β³ is a linear space (subspace of β°).
Pick an inner product β ,β π₯π₯ for each Tπ₯π₯β³.
Define: β ,β π₯π₯ defines a Riemannian metric on β³ if for any two smooth vector fields ππ,ππ the function π₯π₯ β¦ ππ(π₯π₯),ππ(π₯π₯) π₯π₯ is smooth.
A Riemannian manifold is a manifold with a Riemannian metric.
28
Riemannian submanifolds
Let β ,β be the inner product on β°.
Since Tπ₯π₯β³ is a linear subspace of β°, one choice is:
π’π’, π£π£ π₯π₯ = π’π’, π£π£
Claim: this defines a Riemannian metric on β³.
With this metric, β³ is a Riemannian submanifold of β°.
29
Riemannian gradient
Let ππ:β³ β ππ be smooth on a Riemannian manifold.
Define: the Riemannian gradient of ππ at π₯π₯ is the unique tangent vector at π₯π₯ such that:
Dππ π₯π₯ π£π£ = gradππ π₯π₯ , π£π£ π₯π₯ for all π£π£ β Tπ₯π₯β³
Claim: gradππ is a smooth vector field.Claim: if π₯π₯ is a local optimum of ππ, gradππ π₯π₯ = 0.
30
Gradients on Riemannian submanifoldsLet ππ be a smooth extension of ππ. For all π£π£ β Tπ₯π₯β³:
gradππ π₯π₯ ,π£π£ π₯π₯ = Dππ π₯π₯ π£π£= D ππ π₯π₯ π£π£ = grad ππ π₯π₯ ,π£π£
Assume β³ is a Riemannian submanifold of β°.Since β ,β = β ,β π₯π₯, by uniqueness we conclude:
gradππ π₯π₯ = Projπ₯π₯ grad ππ π₯π₯
Projπ₯π₯ is the orthogonal projector from β° to Tπ₯π₯β³.31
A first algorithm: gradient descent
For ππ:ππππ β ππ, π₯π₯ππ+1 = π₯π₯ππ β πΌπΌππgrad ππ π₯π₯ππ
For ππ:β³ β ππ, π₯π₯ππ+1 = π π π₯π₯ππ βπΌπΌππgradππ π₯π₯ππ
For the analysis, need to understand ππ π₯π₯ππ+1 :
The composition ππ β π π π₯π₯: Tπ₯π₯β³ β ππ is on a linear space, hence we may Taylor expand it.
32
A first algorithm: gradient descentπ₯π₯ππ+1 = π π π₯π₯ππ βπΌπΌππgradππ π₯π₯ππ
The composition ππ β π π π₯π₯: Tπ₯π₯β³ β ππ is on a linear space, hence we may Taylor expand it:
ππ π π π₯π₯ π£π£ = ππ π π π₯π₯ 0 + grad ππ β π π π₯π₯ 0 , π£π£ π₯π₯ + ππ π£π£ π₯π₯2
= ππ π₯π₯ + gradππ π₯π₯ , π£π£ π₯π₯ + ππ π£π£ π₯π₯2
Indeed: D ππ β π π π₯π₯ 0 π£π£ = Dππ π π π₯π₯ 0 Dπ π π₯π₯ 0 π£π£ = Dππ π₯π₯ π£π£ .
33
Gradient descent on β³A1 ππ π₯π₯ β₯ ππlow for all π₯π₯ β β³A2 ππ π π π₯π₯ π£π£ β€ ππ π₯π₯ + π£π£, gradππ π₯π₯ + πΏπΏ
2π£π£ 2
Algorithm: π₯π₯ππ+1 = π π π₯π₯ππ β 1πΏπΏ
gradππ(π₯π₯ππ)
Complexity: gradππ(π₯π₯ππ) β€ ππ with some ππ β€ 2πΏπΏ ππ π₯π₯0 β ππlow1ππ2
A2 β ππ π₯π₯ππ+1 β€ ππ π₯π₯ππ β1πΏπΏ
gradππ π₯π₯ππ 2 +12πΏπΏ
gradππ(π₯π₯ππ) 2
β ππ π₯π₯ππ β ππ π₯π₯ππ+1 β₯12πΏπΏ
gradππ π₯π₯ππ 2
ππππ β ππ π₯π₯0 β ππlow β₯ βππ=0πΎπΎβ1 ππ π₯π₯ππ β ππ π₯π₯ππ+1 > ππ2
2πΏπΏπΎπΎ
34
π₯π₯
π π π₯π₯ π£π£π£π£
(for contradiction)
Tips and tricks to get the gradient
Use definition as starting point: Dππ π₯π₯ π£π£ = β―
Cheap gradient principle
Numerical check of gradient: Taylor π‘π‘ β¦ ππ π π π₯π₯ π‘π‘π£π£Manopt: checkgradient(problem)
Automatic differentiation: Python, JuliaNot Matlab :/βthis being said, for theory, often need to manipulate gradient βon paperβ anyway.
35
On to second-order methods
Consider ππ:ππππ β ππ smooth. Taylor says:
ππ π₯π₯ + π£π£ β ππ π₯π₯ + grad ππ π₯π₯ ,π£π£ +12π£π£, Hess ππ π₯π₯ π£π£
If Hess ππ π₯π₯ β» 0, quadratic model minimized for π£π£ s.t.:
Hess ππ π₯π₯ π£π£ = βgrad ππ π₯π₯
From there, we can construct Newtonβs method etc.36
Towards Hessians: reminders from ππππ
Consider ππ:ππππ β ππ smooth.
The Hessian of ππ at π₯π₯ is a linear operator which tells us how the gradient vector field of ππ varies:
Hess ππ π₯π₯ π£π£ = Dgrad ππ π₯π₯ π£π£
With π’π’, π£π£ = π’π’β€π£π£, yields: Hess ππ π₯π₯ ππππ = ππ2 πππππ₯π₯πππππ₯π₯ππ
(π₯π₯).
Notice that Hess ππ π₯π₯ π£π£ is a vector in ππππ.
37
A difficulty on manifolds
A smooth vector field ππ on β³ is a smooth map:we already have a notion of how to differentiate it.
Example: with ππ π₯π₯ = 12π₯π₯β€π΄π΄π₯π₯ on the sphere,
ππ π₯π₯ = gradππ π₯π₯ = Projπ₯π₯ π΄π΄π₯π₯ = π΄π΄π₯π₯ β π₯π₯β€π΄π΄π₯π₯ π₯π₯
Dππ π₯π₯ π’π’ = β― = Projπ₯π₯ π΄π΄π’π’ β π₯π₯β€π΄π΄π₯π₯ π’π’ β π₯π₯β€π΄π΄π’π’ π₯π₯
Issue: Dππ π₯π₯ π’π’ may not be a tangent vector at π₯π₯!
38
Connections:A tool to differentiate vector fieldsLet π³π³ β³ be the set of smooth vector fields on β³.Given ππ β π³π³ β³ and smooth ππ, ππππ π₯π₯ = Dππ π₯π₯ [ππ(π₯π₯)].
A map π»π»:π³π³ β³ Γ π³π³ β³ β π³π³ β³ is a connection if:1. π»π»ππππ+ππππ ππ = πππ»π»ππππ + πππ»π»ππππ2. π»π»ππ ππππ + ππππ = πππ»π»ππππ + πππ»π»ππππ3. π»π»ππ ππππ = ππππ ππ + πππ»π»ππππ
Example: for β³ = β°, π»π»ππππ π₯π₯ = Dππ π₯π₯ ππ π₯π₯Example: for β³ β β°, π»π»ππππ π₯π₯ = Projπ₯π₯ Dππ π₯π₯ ππ π₯π₯
39
Riemannian connections:A unique and favorable choiceLet β³ be a Riemannian manifold.Claim: there exists a unique connection π»π» on β³ s.t.:
4. π»π»ππππ β π»π»ππππ ππ = ππ ππππ β ππ ππππ5. ππ ππ,ππ = π»π»ππππ,ππ + ππ,π»π»ππππ
It is called the Riemannian connection (Levi-Civita).
Claim: if β³ is a Riemannian submanifold of β°, then
π»π»ππππ π₯π₯ = Projπ₯π₯ Dππ π₯π₯ ππ π₯π₯
is the Riemannian connection on β³.40
Riemannian HessiansClaim: π»π»ππππ π₯π₯ depends on ππ only through ππ(π₯π₯).This justifies the notation π»π»π’π’ππ; e.g.: π»π»π’π’ππ = Projπ₯π₯ Dππ π₯π₯ π’π’
Define: the Riemannian Hessian of ππ:β³ β ππ at π₯π₯is a linear operator from Tπ₯π₯β³ to Tπ₯π₯β³ defined by:
Hessππ π₯π₯ π’π’ = π»π»π’π’gradππ
where π»π» is the Riemannian connection.
Claim: Hessππ π₯π₯ is self-adjoint.Claim: if π₯π₯ is a local minimum, then Hessππ π₯π₯ β½ 0.
41
Hessians on Riemannian submanifolds
Hessππ π₯π₯ π’π’ = π»π»π’π’gradππ(π₯π₯)
On a Riemannian submanifold of a linear space,
π»π»π’π’ππ = Projπ₯π₯ Dππ π₯π₯ π’π’
Combining:
Hessππ π₯π₯ π’π’ = Projπ₯π₯ Dgradππ π₯π₯ π’π’
42
Hessians on Riemannian submanifolds (2)
Hessππ π₯π₯ π’π’ = Projπ₯π₯ Dgradππ π₯π₯ π’π’
Example: ππ π₯π₯ = 12π₯π₯β€π΄π΄π₯π₯ on the sphere in ππππ.
ππ π₯π₯ = gradππ π₯π₯ = Projπ₯π₯ π΄π΄π₯π₯ = π΄π΄π₯π₯ β π₯π₯β€π΄π΄π₯π₯ π₯π₯
Dππ π₯π₯ π’π’ = Projπ₯π₯ π΄π΄π’π’ β π₯π₯β€π΄π΄π₯π₯ π’π’ β π₯π₯β€π΄π΄π’π’ π₯π₯
Hessππ π₯π₯ π’π’ = Projπ₯π₯ π΄π΄π’π’ β π₯π₯β€π΄π΄π₯π₯ π’π’
Remarkably, gradππ π₯π₯ = 0 and Hessππ π₯π₯ β½ 0 iff π₯π₯ optimal.
43
Newton, Taylor and RiemannNow that we have a Hessian, we might guess:
ππ π π π₯π₯ π£π£ β πππ₯π₯ π£π£ = ππ π₯π₯ + gradππ π₯π₯ , π£π£ π₯π₯ +12π£π£, Hessππ π₯π₯ π£π£ π₯π₯
If Hessππ π₯π₯ is invertible, πππ₯π₯: Tπ₯π₯β³ β ππ has one critical point, solution of this linear system:
Hessππ π₯π₯ π£π£ = βgradππ π₯π₯ for π£π£ β Tπ₯π₯β³
Newtonβs method on β³: π₯π₯next = π π π₯π₯ π£π£ .
Claim: if Hessππ π₯π₯β β» 0, quadratic local convergence.44
We need one more toolβ¦
The truncated expansion
ππ π π π₯π₯ π£π£ β ππ π₯π₯ + gradππ π₯π₯ ,π£π£ π₯π₯ +12π£π£, Hessππ π₯π₯ π£π£ π₯π₯
is not quite correct (well, not alwaysβ¦)
To see why, letβs expand ππ along some smooth curve.
45
We need one more toolβ¦ (2)Letβs expand ππ along some smooth curve.
With ππ:ππ ββ³ s.t. ππ 0 = π₯π₯ and ππβ² 0 = π£π£, the composition ππ = ππ β ππ maps ππ β ππ, so Taylor holds:
ππ π‘π‘ β ππ 0 + π‘π‘ππβ² 0 +π‘π‘2
2ππβ²β² 0
ππ 0 = ππ π₯π₯
ππβ² π‘π‘ = Dππ ππ π‘π‘ ππβ² π‘π‘ = gradππ ππ π‘π‘ , ππβ² π‘π‘ ππ(π‘π‘)
ππβ² 0 = gradππ π₯π₯ , π£π£ π₯π₯
ππβ²β² 0 = ? ? ?46
Differentiating vector fields along curves
A map ππ:ππ β Tβ³ is a vector field along ππ:ππ ββ³ if ππ(π‘π‘)is a tangent vector at ππ(π‘π‘).Let π³π³ ππ denote the set of smooth such fields.
Claim: there exists a unique operator Ddπ‘π‘
:π³π³ ππ β π³π³ ππ s.t.1. D
dπ‘π‘ππππ + ππππ = ππ D
dπ‘π‘ππ + ππ D
dπ‘π‘ππ
2. Ddπ‘π‘
ππππ = ππβ²ππ + ππ Ddπ‘π‘ππ
3. Ddπ‘π‘
ππ ππ(π‘π‘) = π»π»ππβ²(π‘π‘)ππ
4. ddπ‘π‘
ππ π‘π‘ ,ππ π‘π‘ ππ(π‘π‘) = Ddπ‘π‘ππ π‘π‘ ,ππ
ππ(π‘π‘)+ ππ π‘π‘ , D
dπ‘π‘ππ π‘π‘
ππ(π‘π‘)
where π»π» is the Riemannian connection.47
Differentiating fields along curves (2)
Claim: there exists a unique operator Ddπ‘π‘
:π³π³ ππ β π³π³ ππ s.t.1. D
dπ‘π‘ππππ + ππππ = ππ D
dπ‘π‘ππ + ππ D
dπ‘π‘ππ
2. Ddπ‘π‘
ππππ = ππβ²ππ + ππ Ddπ‘π‘ππ
3. Ddπ‘π‘
ππ ππ(π‘π‘) = π»π»ππβ²(π‘π‘)ππ
4. ddπ‘π‘
ππ π‘π‘ ,ππ π‘π‘ ππ(π‘π‘) = Ddπ‘π‘ππ π‘π‘ ,ππ
ππ(π‘π‘)+ ππ π‘π‘ , D
dπ‘π‘ππ π‘π‘
ππ(π‘π‘)
where π»π» is the Riemannian connection.
Claim: if β³ is a Riemannian submanifold of β°,Ddπ‘π‘ππ π‘π‘ = Projππ π‘π‘
ddπ‘π‘ππ π‘π‘
48
With ππ π‘π‘ = ππ ππ π‘π‘ and ππβ² π‘π‘ = gradππ ππ π‘π‘ , ππβ² π‘π‘ ππ(π‘π‘):
ππβ²β² π‘π‘ =Ddπ‘π‘
gradππ ππ π‘π‘ , ππβ² π‘π‘ππ(π‘π‘)
+ gradππ ππ π‘π‘ ,Ddπ‘π‘ππβ² π‘π‘
ππ(π‘π‘)
= π»π»ππβ² π‘π‘ gradππ, ππβ² π‘π‘ππ(π‘π‘)
+ gradππ ππ π‘π‘ , ππβ²β²(π‘π‘) ππ(π‘π‘)
ππβ²β²(0) = Hessππ π₯π₯ π£π£ , π£π£ π₯π₯ + gradππ π₯π₯ , ππβ²β²(0) π₯π₯
49
1. Ddπ‘π‘
ππππ + ππππ = ππ Ddπ‘π‘ππ + ππ D
dπ‘π‘ππ
2. Ddπ‘π‘
ππππ = ππβ²ππ + ππ Ddπ‘π‘ππ
3. Ddπ‘π‘
ππ ππ(π‘π‘) = π»π»ππβ²(π‘π‘)ππ
4. ddπ‘π‘
ππ π‘π‘ ,ππ π‘π‘ ππ(π‘π‘) = Ddπ‘π‘ππ π‘π‘ ,ππ
ππ(π‘π‘)+ ππ π‘π‘ , D
dπ‘π‘ππ π‘π‘
ππ(π‘π‘)
With ππ π‘π‘ = ππ ππ π‘π‘ and ππβ² π‘π‘ = gradππ ππ π‘π‘ , ππβ² π‘π‘ ππ(π‘π‘):
ππβ²β² π‘π‘ =Ddπ‘π‘
gradππ ππ π‘π‘ , ππβ² π‘π‘ππ(π‘π‘)
+ gradππ ππ π‘π‘ ,Ddπ‘π‘ππβ² π‘π‘
ππ(π‘π‘)
= π»π»ππβ² π‘π‘ gradππ, ππβ² π‘π‘ππ(π‘π‘)
+ gradππ ππ π‘π‘ , ππβ²β²(π‘π‘) ππ(π‘π‘)
ππβ²β²(0) = Hessππ π₯π₯ π£π£ , π£π£ π₯π₯ + gradππ π₯π₯ , ππβ²β²(0) π₯π₯
ππ ππ π‘π‘ = ππ π₯π₯ + π‘π‘ β gradππ π₯π₯ , π£π£ π₯π₯ +π‘π‘2
2β Hessππ π₯π₯ π£π£ , π£π£ π₯π₯
+π‘π‘2
2β gradππ π₯π₯ , ππβ²β² 0 π₯π₯ + ππ π‘π‘3
The annoying term vanishes at critical points and for specialcurves (special retractions). Mostly fine for optimization.
50
Trust-region method:Newtonβs with a safeguardWith the same tools, we can design a Riemanniantrust-region method: RTR (Absil, Baker & Gallivan β07).
Approximately minimizes quadratic model of ππ β π π π₯π₯ππrestricted to a ball in Tπ₯π₯ππβ³, with dynamic radius.
Complexity known. Excellent performance, also withapproximate Hessian (e.g., finite differences of gradππ).
In Manopt, call trustregions(problem).
51
In the lecture notes:β’ Proofs for all claims in these slidesβ’ References to the (growing) literatureβ’ Short descriptions of applicationsβ’ Fully worked out manifolds
E.g.: Stiefel, fixed-rank matrices, general π₯π₯ β β°:β π₯π₯ = 0β’ Details about computation, pitfalls and tricks
E.g.: how to compute gradients, checkgradient, checkhessian
β’ Theory for general manifoldsβ’ Theory for quotient manifoldsβ’ Discussion of more advanced geometric tools
E.g.: distance, exp, log, transports, Lipschitz, finite differencesβ’ Basics of geodesic convexity
52
nicolasboumal.net/book