View
5
Download
0
Category
Preview:
Citation preview
Two Geometric Problems in Optimal Transport:Discrete and Gaussian measures
Yoav Zemel∗
Statistical LaboratoryUniversity of Cambridge
yz668@cam.ac.uk
∗Support from the Swiss National Science Foundation grant 178220 is gratefully acknowledged
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 1 / 45
Optimal Transport: Fast Probabilistic ApproximationsWith Exact Solvers
joint work with Max Sommerfeld, Jorn Schrieber & Axel Munk
Georg–August–Universitat Gottingen
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 2 / 45
Optimal coupling and the Monge–Kantorovich problem
(X , d) separable metric space, p ≥ 1.
Monge–Kantorovich problem & Wasserstein distance
Let X ∼ µ and Y ∼ ν be random elements on (X , d) and define
Wp(X,Y ) ≡Wp(µ, ν) :=
[inf
Z1d=X,Z2
d=Y
E[dp(Z1, Z2)]
]1/p
.
Minimise over all random elements (Z1, Z2) on X ×X with Xd= Z1 and Y
d= Z2.
Defines metric on probability measures on X (with finite p-th moments)
Probability uses: metrises weak convergence + p-th moments, easy to bound,subadditive
Statistical uses: goodness of fit/deformation models/registration of pointprocesses/TDA/neural networks/...
Takes into account the geometry of (X , d): Wp(x0, y0) = d(x0, y0)
Close to the human perception of similarity of images
Difficult to compute
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 3 / 45
Optimal coupling and the Monge–Kantorovich problem
(X , d) separable metric space, p ≥ 1.
Monge–Kantorovich problem & Wasserstein distance
Let X ∼ µ and Y ∼ ν be random elements on (X , d) and define
Wp(X,Y ) ≡Wp(µ, ν) :=
[inf
Z1d=X,Z2
d=Y
E[dp(Z1, Z2)]
]1/p
.
Minimise over all random elements (Z1, Z2) on X ×X with Xd= Z1 and Y
d= Z2.
Defines metric on probability measures on X (with finite p-th moments)
Probability uses: metrises weak convergence + p-th moments, easy to bound,subadditive
Statistical uses: goodness of fit/deformation models/registration of pointprocesses/TDA/neural networks/...
Takes into account the geometry of (X , d): Wp(x0, y0) = d(x0, y0)
Close to the human perception of similarity of images
Difficult to computeYoav Zemel (Cambridge) subsampling-Wasserstein-covariance 3 / 45
A subsampling approach
Focus on a finite metric space (X , d) of size N
Computational complexity O(N3) is prohibitive: take images of size1024× 1024. Then N = 10242 > 106 and one needs 1018/1011 = 107
seconds > 16 weeks
We propose a subsampling scheme: sample S << N points from measures µand ν and compute the distance between µS and νS
Repeat this B times
Computation time O(BS3)
S (and B) controls the computational-statistical tradeoff: large S yieldsbetter approximations, small S fast to compute
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 4 / 45
N = 10000 pixels
Subsample S = 1000 points
W = 0.0208
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 5 / 45
N = 10000 pixels
Subsample S = 1000 points
W = 0.0211
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 6 / 45
A subsampling approach
True distance 0.0217, approximation 0.0209, 3.7% relative errorComputation time 5 minutes, empirical 2.4 seconds, 125 times faster
1%
10%
100%
10−5 10−4 10−3 10−2 10−1 100
Runtime Relativeto Exact Algorithm
Mea
n R
elat
ive
App
roxi
mat
ion
Err
or
ProblemSize
32x3264x64128x128
SampleSize
100500100020004000
Theoretical guarantees?
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 7 / 45
Error bounds
(X , d) metric space of cardinality N = |X |N (X , δ) is the number of δ-balls (centred at points of X ) needed to cover XSommerfeld, Schrieber, Z, Munk (2019, JMLR) show that
E[W pp (µ, µS)
]≤ (diam(X ))pES−1/2, E = E(p,X )
E = 2p−1 infq≥2
inflmax∈N
q2p
( √N
q(lmax+1)p+
lmax∑l=0
q−lp√N (X , q−ldiam(X ))
)
E|Wp(µS , νS)−Wp(µ, ν)| ≤ 2diam(X )E1/pS−1/(2p)
The repetition number B cannot improve the bias; it only appears indeviation bounds
The proof is based on majorising (X , d) with an ultrametric tree, followingBoissard & Le Gouic (2014); Fournier & Guillin (2015), and using the explicitformula on ultrametric spaces (Kloeckner 2015).
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 8 / 45
Error bounds
(X , d) metric space of cardinality N = |X |N (X , δ) is the number of δ-balls (centred at points of X ) needed to cover XSommerfeld, Schrieber, Z, Munk (2019, JMLR) show that
E[W pp (µ, µS)
]≤ (diam(X ))pES−1/2, E = E(p,X )
E = 2p−1 infq≥2
inflmax∈N
q2p
( √N
q(lmax+1)p+
lmax∑l=0
q−lp√N (X , q−ldiam(X ))︸ ︷︷ ︸
=O(qlD/2), X⊂(RD,‖·‖)
)
E|Wp(µS , νS)−Wp(µ, ν)| ≤ 2diam(X )E1/pS−1/(2p)
The repetition number B cannot improve the bias; it only appears indeviation bounds
The proof is based on majorising (X , d) with an ultrametric tree, followingBoissard & Le Gouic (2014); Fournier & Guillin (2015), and using the explicitformula on ultrametric spaces (Kloeckner 2015).
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 9 / 45
Euclidean error bounds
For (X , d) ⊂ (RD, ‖ · ‖2), E[W pp (µ, µS)] is bounded above by
S−1/2Dp/223p−1(diam(X ))pα(D, p)×
1 D < 2p,
log2N D = 2p,
N1/2−p/D D > 2p.
α(D, p) is explicit and ≤ 3 +√
2 (p ∈ N)
The power of N is < 1/2, so error can vanish with S << N
In low dimensions the error does not even depend on N
Similar bounds for any other norm on RD
Dependence on S and N is optimal when D > 2p: there are large families ofmeasures µ on N points such that
E[W pp (µ, µS)] ≥ S−1/2β(D, p)N1/2−p/D
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 10 / 45
Summary
We propose a probabilistic meta-algorithm approach that
1 is extremely easy to implement and to tune towards higher accuracy orshorter computation time as desired
2 can be used with any algorithm for transportation problems as a back-end,including general LP solvers, specialized network solvers and algorithms usingentropic penalization
3 comes with theoretical non-asymptotic guarantees for the approximation errorof the Wasserstein distance—in particular, this error is independent of thesize of the original problem in many important cases, including images
4 works well in practice. For example, the Wasserstein distance between two1282-pixel images can typically be approximated with a relative error of lessthan 5% in only 1% of the time required for exact computation
Sommerfeld, M., Schrieber, J., Zemel, Y. & Munk, A. (2019).Optimal transport: Fast probabilistic approximations with exact solvers.Journal of Machine Learning Research 20(105):1–23.
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 11 / 45
The Procrustes Metric on Covariance Operators isOptimal Transport: Statistical Implications
joint work with Valentina Masarotto & Victor M. Panaretos
Ecole polytechnique Federale de Lausanne
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 12 / 45
Outline
1 Procrustes metric on covariance operators
2 Optimal coupling of Gaussian processes
3 1 = 2
4 So what?In other words: (some) statistical applications
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 13 / 45
1. Covariance operators in functional data analysis
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 14 / 45
Covariance operators in functional data analysis
Aim: nonparametric inference on random function X(t), t ∈ [0, 1]
Data: N identically distributed realisations {Xj(t)}Nj=1
Setup: view X as a random element of L2[0, 1] (or sep. Hilbert space X )
Basic objects (and only objects if X Gaussian)
1 Covariance operator S : L2[0, 1]→ L2[0, 1]:
[Sf ](t) =
∫ 1
0
Cov[X(t), X(s)]f(s) ds
2 Karhunen–Loeve expansion: for the eigen decomposition (λn, ϕn) of S (thatis, (ϕn)n is an orthonormal basis and Sϕn = λnϕn),
X(t)− EX(t) =
∞∑n=1
ξnϕn(t),
where ξn = 〈X − EX,ϕn〉 are zero-mean uncorrelated with variance λn.
In practice, observe discrete measurements Xj(tk) + εjk on a grid (tk)Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 15 / 45
Metrics on covariance operators
Since covariances Si are Hilbert–Schmidt, use induced norm to contrast them:
d(Si, Sj) = |||Si − Sj |||2, |||S|||2 =√
trace[S∗S].
This implies a linear model for the covariances as
Si = S + ∆i, E∆i = 0.
Metric compatible with non-linear nature?
Procrustes metric on covariances (Pigoli et al., 2014)
For two covariance operators S1, S2 : X → X on a separable Hilbert space X ,define the Procrustes distance as
Π(S1, S2) = infU∗U=I
∣∣∣∣∣∣∣∣∣S1/21 − S1/2
2 U∣∣∣∣∣∣∣∣∣
2,
where I is the identity and {U : U∗U = I} is the collection of unitary operators.
Generalise matrix version (Dryden et al. 2009) motivated from statistical shapetheory
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 16 / 45
2. Optimal coupling of Gaussian processes
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 17 / 45
Optimal coupling and the Monge–Kantorovich problem
Monge–Kantorovich problem & Wasserstein distance
Let X ∼ µ and Y ∼ ν be random elements on (X , ‖ · ‖) and define
W2(X,Y ) ≡W2(µ, ν) :=√
infZ1
d=X,Z2
d=Y
E‖Z1 − Z2‖2
over all random elements (Z1, Z2) on X × X such that Xd= Z1 and Y
d= Z2.
If µ regular1, optimal coupling π is deterministic:
manifested as the joint distribution of (X, t(X)) for some deterministicmap t : X → X , called an optimal transport map.
Optimal deterministic map uniquely exists when departure measure is regular,and is characterised as gradient of convex potential.
Denote optimal map from µ to ν by tνµ.1vanishes on Gaussian null sets
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 18 / 45
Optimal coupling of Gaussian measures
For µ ≡ N(0, S1) and ν ≡ N(0, S2) centred Gaussian measures,
W 2(µ, ν) = trace(S1) + trace(S2)− 2trace([S1/21 S2S
1/21 ]1/2).
µ is regular ⇐⇒ S1 injective
When dim(X ) <∞, invertibility of S1 guarantees existence & uniqueness ofdeterministic optimal transport map
tS2
S1:= t
N(0,S2)N(0,S1) = S
−1/21 (S
1/21 S2S
1/21 )1/2S
−1/21 .
Transport map formula essentially valid when dim(X ) =∞:
Existence/uniqueness of optimal maps (Cuesta-Albertos et al, 1996)
Let µ ≡ N(0, S1) and ν ≡ N(0, S2) be centred Gaussian measures in X . ProvidedKer(S1) ⊆ Ker(S2), there exists a subspace Xsub ⊆ X with µ-measure 1, onwhich the optimal map is well-defined and is given by the linear operator
tS2
S1= S
−1/21 (S
1/21 S2S
1/21 )1/2S
−1/21 .
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 19 / 45
(Pseudo-)Riemannian geometry of the Wasserstein space
Fix a regular µ, so tνµ exists for any measure νUnique geodesic between µ and ν (McCann’s interpolant)
[stνµ + (1− s)I]#µ, s ∈ [0, 1].
Tangent space, exponential map (surjective) and log map
Tanµ = {s(t− I) : t optimal between µ and t#µ; s > 0}L2(µ)
= {∇ϕ : ϕ ∈ Cyl∞c (X )}L2(µ)
expµ(r− I) = r#µ logµ(ν) = tνµ − I
[see Ambrosio, Gigli & Savare, 2008; figure from Choi et al, 2015]Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 20 / 45
3. Putting things together
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 21 / 45
Wasserstein ≡ Procrustes
Equivalence [Masarotto, Panaretos & Z., 2019]
The Procrustes distance between two trace-class covariance operators S1 and S2
on X coincides with the Wasserstein distance between Gaussian measuresN(0, S1) and N(0, S2) on X ,
Π(S1, S2) = infU∗U=I
∣∣∣∣∣∣∣∣∣S1/21 − S1/2
2 U∣∣∣∣∣∣∣∣∣
2=
=
√trace(S1 + S2 − 2[S
1/22 S1S
1/22 ]1/2) = W2(N(0, S1), N(0, S2)).
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 22 / 45
Almost Riemannian geometry on covariance operators
The tangent space at S is (a Hilbert space)
TanS = {A : A = A∗,∣∣∣∣∣∣S1/2A
∣∣∣∣∣∣2<∞},
where the closure is with respect to the associated inner product
〈A,B〉S = trace[ASB] = E 〈AX,BX〉 , X ∼ N(0, S)
Exponential mapexpS(A) = (A+ I)S(A+ I)
Injectivity condition ker(S0) ⊆ ker(S1) suffices for1 existence of log map
logS0(S1) = t1
0 − I = S−1/20 [S
1/20 S1S
1/20 ]1/2S
−1/20 − I
2 a unique minimal geodesic
St = t2S1 + (1− t)2S0 + t(1− t)[t10S0 + S0t
10],
with t10 = tS1
S0= S
−1/20 [S
1/20 S1S
1/20 ]1/2S
−1/20
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 23 / 45
Topology
Procrustes topology [Masarotto, Panaretos & Z., 2019]
The following are equivalent for covariance operators (Sn)∞n=1, S on X :
Π(Sn, S)→ 0
N(0, Sn)→ N(0, S) in distribution
|||Sn − S|||1 → 0, where |||S|||1 = trace([S∗S]1/2) is the trace norm∣∣∣∣∣∣∣∣∣S1/2n − S1/2
∣∣∣∣∣∣∣∣∣2→ 0
Corollary: stability to finite dimensional approximations.
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 24 / 45
Uniform stability under projections
Uniform stability [Masarotto, Panaretos & Z., 2019]
Let {ek}k≥1 be an orthonormal basis of X and Pn =∑nj=1 ej ⊗ ej be the
projection on the span of {e1, . . . , en}. Let B be Π-compact. Then
supS1,S2∈B
∣∣∣Π(PnS1Pn,PnS2Pn)−Π(S1, S2)∣∣∣→ 0, n→∞.
To construct a Π-compact B, note that this is equivalent with:
B is |||·|||1-compact
√B is |||·|||2-compact
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 25 / 45
Frechet means (barycentre)
The Frechet mean of a collection S1, . . . , SN of covariance operators is
S ∈ arg minS
N∑i=1
Π2(S, Si)
Always existsUnique if one Si is injective“Swells” less than the arithmetic mean:
(S1 + · · ·+ SN )/N − S ≥ 0 (a nonnegative operator)
Can be computed by steepest descent
Stability [Masarotto, Panaretos & Z., 2019]
Let Ski → Si as k →∞, for all i = 1, . . . , N
Let Sk
Frechet mean of (Sk1 , . . . , SkN )
If Frechet mean S of (S1, . . . , SN ) unique, then
Sk → S
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 26 / 45
4. Two statistical applications
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 27 / 45
Tangent space functional principal component analysis
Once S found, can do principal component analysis on the tangent space:
Lift to the tangent space
logS(Si) = tSi
S− I = (S)−1/2[S
1/2SiS
1/2]1/2(S)−1/2 − I ∈ TanS
(requires existence of the maps tSi
S)
Since TanS is a Hilbert space, principal component analysis amounts to:
1 Constructing tangent space empirical covariance
1
N
N∑i=1
(tSi
S− I
)⊗S(tSi
S− I
)2 Extracting eigenvectors on TanS and retracting their span via exp
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 28 / 45
Optimal maps
Let S1, . . . , SN covariances with Frechet mean S
Transport maps tSi
Sexist if...
Conjecture
If S1, . . . , SN are injective, then so is S
True in finite dimensions
True if they commute
Trouble can be avoided though:
Theorem (Masarotto, Panaretos & Z., in progress)
Transport maps tSi
Sexist as bounded operators with∣∣∣∣∣∣∣∣∣tSi
S
∣∣∣∣∣∣∣∣∣∞≤ N, |||A|||∞ = sup
x∈X\{0}
‖Ax‖‖x‖
↪→ tangent space principal component analysis well-definedYoav Zemel (Cambridge) subsampling-Wasserstein-covariance 29 / 45
Testing homogeneity for covariances
Have samples Xi,1, . . . , Xi,K with covariances Si
Wish to test H0 : S1 = · · · = SN
Key idea: rewrite H0 as
tSi
S− I := ∆i = 0, i = 1, . . . , N.
Test statistic
Tr =
N∑i=1
∣∣∣∣∣∣∣∣∣∆i
∣∣∣∣∣∣∣∣∣2r, r ∈ {1, 2,∞}.
Reject for high values
Calibrate using permutations
More powerful than other methods
Simulation setup taken from Cabassi et al. (2017): k1 groups have covariance(1 + γ)Sm and k2 = N − k1 groups have covariance Sm, the male covarianceoperator of the Berkeley growth dataset.
K = 20 curves generated from each group
Compare power with “pairwise test” of Cabassi et al. (2017)
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 30 / 45
Additive perturbations: Gaussian marginals
γ
Pow
er
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0
k1=4, k2=4 k1=1, k2=7
k1=1, k2=3
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.2
0.4
0.6
0.8
1.0
k1=2, k2=2
Pairwise Transport Map
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 31 / 45
Additive perturbations: Student marginals
γ
Pow
er
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0
k1=4, k2=4 k1=1, k2=7
k1=1, k2=3
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.2
0.4
0.6
0.8
1.0
k1=2, k2=2
Pairwise Transport Map
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 32 / 45
Geodesic perturbations: Gaussian marginals
γ
Pow
er
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0
k1=4, k2=4 k1=1, k2=7
k1=1, k2=3
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.2
0.4
0.6
0.8
1.0
k1=2, k2=2
Pairwise Transport Map
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 33 / 45
Geodesic perturbations: Student marginals
γ
Pow
er
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0
k1=4, k2=4 k1=1, k2=7
k1=1, k2=3
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.2
0.4
0.6
0.8
1.0
k1=2, k2=2
Pairwise Transport Map
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 34 / 45
Generative model
Hilbert–Schmidt distance |||S1 − S2|||2 implies linear model
Si = S + ∆i, E∆i = 0.
What about Procrustes distance Π?
Generative model and deformations [MPZ19]
Let S be any covariance operator, and let t : X → X be a random self-adjointnonnegative operator satisfying
1 E|||t|||2∞ <∞2 Et = I
Then S is a Π-Frechet mean of the random operator tSt:
EΠ2(S, tSt) ≤ EΠ2(S′, tSt)
for any covariance operator S′.
Linear model on TanS!
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 35 / 45
References
1 Sommerfeld, Schrieber, Zemel & Munk (2019).Optimal transport: Fast probabilistic approximations with exact solvers.Journal of Machine Learning Research 20(105):1–23.
2 Masarotto, Panaretos & Zemel (2019). Procrustes Metrics on CovarianceOperators and Optimal Transportation of Gaussian Processes. Invited paper,Special Issue on Manifold Statistics, Sankhya A 81(1):172–213.
3 Schrieber, Schuhmacher & Gottschlich (2016). DOTmark — A Benchmarkfor Discrete Optimal Transport. IEEE Access, 5:271–282.
4 Peyre, G. & Cuturi, M. (2019). Computational Optimal transport.Foundations and Trends in Machine Learning.
5 Villani, C. (2008). Optimal Transport: Old and New. Springer.
6 Panaretos & Zemel (2019). Statistical Aspects of Wasserstein Distances.Annual Review of Statistics and Its Applications 6:405–431.
7 Panaretos & Zemel (2018+). An Invitation to Statistics in WassersteinSpace. SpringerBriefs in Probability & Mathematical Statistics (in press).
8 Bigot (2019). Statistical data analysis in the Wasserstein space.arXiv:1907.08417.
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 36 / 45
α(D, p) =
1/(1− 2D/2−p) D < 2p,
2 +D−1 D = 2p,
2 + 1/(2D/2−p − 1) D > 2p.
≤ 3 +√
2 (p,D ∈ Z).
P
[|Wp(µS , νS)−Wp(µ, ν)| ≥ z +
2E1/p
S1/(2p)
]≤ 2 exp
(− SBz2p
8 diam(X )2p
).
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 37 / 45
Computation
In practice, have finite rank approximations S1, . . . , SN
Steepest descent – essentially Procrustean type algorithm!
(A) For j = 0, set Γj = S1 + · · ·+ SN .
(B) For i = 1, . . . , N solve the (pairwise) coupling problem and find the optimal
transport map tSi
Γj= Γ
−1/2j (Γ
1/2j SiΓ
1/2j )1/2Γ
−1/2j from Γj to Si.
(C) Define the map tj = 1N
∑Ni=1 t
Si
Γj= 1
N
∑Ni=1 Γ
−1/2j (Γ
1/2j SiΓ
1/2j )1/2Γ
−1/2j
(D) Set Γj+1 = tjΓjtj .
(E) Iterate (B)–(D).
Provably converges to unique Frechet mean when dim(X ) <∞Very stable/fast in practice.
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 38 / 45
Example: Frechet mean of four covariances
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 39 / 45
Example: optimal maps from the Frechet mean
FRECHET MEAN IN WASSERSTEIN SPACE 5
−3 −2 −1 0 1 2 3
−3
−2
−1
01
23
−3 −2 −1 0 1 2 3
−3
−2
−1
01
23
−3 −2 −1 0 1 2 3
−3
−2
−1
01
23
−3 −2 −1 0 1 2 3
−3
−2
−1
01
23
Fig 8. Registration maps from the Frechet mean of Figure 6 in the article to the four measures of Figure 6in the article
Institut de MathematiquesEcole Polytechnique Federale de Lausanne1015 Lausanne, Switzerland??
??
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 40 / 45
Steepest descent is a Procrustes algorithm on optimal maps
1 Registration: register each Si to current template Γj , via maps tSi
Γj.
In geometrical terms, lift {Si}Ni=1 to tangent space at Γj
Local linear coordinates (actually global): tSiΓj− I = logΓj
(Si)
2 Averaging: average registered measures coordinate-wise
In geometrical terms, average local linear representation tSiΓj− I = logΓj
(Si)
Then retract linear average back onto the manifold via the exponential map
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 41 / 45
Interpretation: phase variation
Suggests connections with the registration problem in functional data analysis:1 Interested in a Gaussian process X ∼ N(0, S), viewed via KL-expansion
X =
∞∑n=1
σ1/2n ξnϕn
for {σn, ϕn} the eigenvalue/eigenfunction pairs of S, and ξniid∼ N(0, 1)
↪→ Amplitude variation: superposition of random N(0, σn) amplitudefluctuations around fixed (deterministic) modes ϕn
2 Instead, one observes warped version,
X = tX =
∞∑n=1
σ1/2n ξntϕn
↪→ Phase variation2: emanates from deformation fluctuations of the modes ϕn
3 Tangent PCA+multicoupling: optimal registration!
(tSSiare registration maps!)
2The term phase comes from the case X = L2[0, 1], X(x) : [0, 1] → R where deformationvariation is attributable to the “x-axis” (ordinate).
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 42 / 45
Principal component analysis with different inner product
Tangent space data
∆i = logS(Si) = tSi
S− I = (S)−1/2[S
1/2SiS
1/2]1/2(S)−1/2 − I ∈ TanS
Centred: ∆i + · · ·+ ∆N = 0
TanS is Hilbert, but inner product is not Hilbert–Schmidt one, rather
〈A,B〉S = trace[ASB]
Empirical tangent space covariance
K =1
N
N∑i=1
∆i ⊗S ∆i, (A⊗S B)C = 〈B,C〉S A, A,B,C ∈ TanS
Eigenvalues and vectors can be found in Hilbert–Schmidt space (Ocana,Aguilera & Valderrama 1999)
KA = λA for A ∈ TanS if and only if KHSS1/2A = λ(S
1/2A) with
KHS =1
N
N∑i=1
∆iS1/2 ⊗∆iS
1/2
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 43 / 45
Interpretation: optimal multicoupling
Coupling several Gaussian measures (multicoupling)
Let N(0, Si) be Gaussians on X . Construct random vector (X1, . . . , XN ) ∈ XNso
1 Xi ∼ N(0, Si) for all i.
2 For any other random vector (Y1, . . . , YN ) ∈ XN with Yi ∼ N(0, Si),∑i<j
E‖Xi −Xj‖2 ≤∑i<j
E‖Yi − Yj‖2.
Answer [MPZ19]
Find Frechet mean S, letZ ∼ N(0, S)
and defineXi = tSi
SZ = (S)−1/2(S
1/2SiS
1/2)1/2(S)−1/2Z.
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 44 / 45
Interpretation: optimal multicoupling
Coupling several Gaussian measures (multicoupling)
Let N(0, Si) be Gaussians on X . Construct random vector (X1, . . . , XN ) ∈ XNso
1 Xi ∼ N(0, Si) for all i.
2 For any other random vector (Y1, . . . , YN ) ∈ XN with Yi ∼ N(0, Si),∑i<j
E‖Xi −Xj‖2 ≤∑i<j
E‖Yi − Yj‖2.
Answer [MPZ19]
Find Frechet mean S, letZ ∼ N(0, S)
and defineXi = tSi
SZ = (S)−1/2(S
1/2SiS
1/2)1/2(S)−1/2Z.
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 44 / 45
Collections of covariance operators
n populations with covariance operators
Si : X → X , i = 1, . . . , n
from which we observe Ni noisy realisations
Xijk = Xij(tk) + εijk, i ≤ n, j ≤ Ni.
DNA biophysics (sequence-dependent flexibility, Panaretos et al., 2010)
Computational linguistics (phonetic analysis, Pigoli et al., 2014, 2018)
Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 45 / 45
Recommended