STATISTICS ON RIEMANNIAN MANIFOLDS WITH ...ab216/ritten.pdf1.1 Frechet Mean Deﬁnition 1.1 Suppose F(p) < ∞ for some p ∈ M. Then the set of all p for which F(p) is the minimum

STATISTICS ON RIEMANNIAN

MANIFOLDS WITH APPLICATIONS TO

THE PLANER SHAPE SPACE

Submitted by: ABHISHEK BHATTACHARYAProject Advisor: Dr. RABI BHATTACHARYA

November 15, 2006

Abstract

This article presents certain recent methodologies and some newresults for the statistical analysis of distributions of shapes on man-ifolds. An important example considered in some detail here is the2-D shape space of k-ads, comprising all configurations of k planarlandmarks (k > 2) -modulo translation, scaling and rotation. If oneleaves out configurations of identical k points, the planar shape spacecan be identified with the complex projective space CP k−2.

The statistical analysis of shape distributions based on randomsamples is important in many areas such as morphometrics (discrim-ination and classification of biological shapes), medical diagnostics(detection of change or deformation of shapes in some organs due tosome disease, for example) and machine vision (e.g., digital recordingand analysis based on planar views of 3-D objects, when the positionfrom which the object was viewed or pictured is unknown).

1 Frechet Mean and Variation on Metric Spaces.

Let (M,d) be a metric space and Q a probability measure on M . Define theFrechet function of Q as

F (p) =

∫

M

d2(p, x)Q(dx), p ∈ M. (1.1)

1

1.1 Frechet Mean

Definition 1.1 Suppose F (p) < ∞ for some p ∈ M . Then the set of all pfor which F (p) is the minimum value of F on M is called the Frechet Meanset of Q. If this set is a singleton, that is the Frechet Mean of Q.If X1, X2, . . . , Xn are independent and identically distributed (iid) with com-mom distribution Q, and Qn

.= 1

n

∑nj=1 δXj

is the corresponding empiricaldistribution, then the Frechet mean set of Qn is called the sample Frechetmean set. If this set is a singleton, it is called the sample Frechet mean.

Proposition 1.1 Suppose every closed and bounded subset of M is compact.If the Frechet function F (p) of Q is finite for some p, then the Frechet meanset of Q is nonempty and compact.Proof By the triangle inequality,

d2(q, x) ≤ (d(p, q) + d(p, x))2

= d2(p, x) + d2(p, q) + 2d(p, q)d(p, x)

Hence

F (q) ≤ F (p) + d2(p, q) + 2d(p, q)F 1/2(p),

|F (q) − F (p)| ≤ d2(p, q) + 2d(p, q)maxF 1/2(p), F 1/2(q) (1.2)

Hence if F (p) is finite for some p, it is finite for all p, and p 7→ F (p) iscontinuous on M . To show that a minimizer exists when F is finite, let cdenote the infimum of F , and let pn : n ≥ 1 be a sequence such thatF (pn) −→ c. Now

d2(pn, p1) ≤ 2d2(pn, x) + 2d2(x, p1), x ∈ M.

⇒d2(pn, p1) ≤ 2F (pn) + 2F (p1) (1.3)

proving that B = pn : n ≥ 1 is a bounded sequence, so that its closureB is compact (Note that diam(B) = diam(B) < ∞). Then pn : n ≥ 1has a convergent subsequence pnk

−→ p∗ as k → ∞. By continuity of F ,F (p∗) = c, so that p∗ is a minimizer of F . Also if m is an arbitrary mini-mizer, d2(m, p∗) ≤ 2F (m) + 2F (p∗) < 4c. Hence the set of all minimizers isa bounded set, say, D. Since every point in D is a limit of a sequence in D,and F is continuous, every point in D is a minimizer of F . Hence D = D is

2

compact.

Theorem 1.2 (Consistency of the Sample Frechet Mean). Assume(1) that every closed bounded subset of M is compact, and (2) F is fi-nite on M . Then (a) given any ǫ > 0, there exists a P -null set N andn(ω) < ∞∀ω ∈ N c, such that the Frechet sample mean set of the empiricaldistribution Qn = Qn(ω) is contained in the ǫ−neighborhood of the Frechetmean set of Q ∀n ≥ n(ω), and (b) if the Frechet mean of Q exists (as aunique minimizer of F), then every measurable selection from the Frechetsample mean set is a strongly consistent estimator of the Frechet mean of Q.Proof See Bhattacharya & Patrangenaru Theorem 2.3 [1].

Note that a sequence of estimator θn defined on a probability space(Ω, P, B) is said to be a strongly consistent estimator of a parameter θ, ifθn(ω) −→ θ as n → ∞ for every ω outside of a P-null set.

Remark 1.1 It is known that a connected Riemannian manifold, M whichis complete (in its geodesic distance) satisfies the topological hypothesis ofProposition 1.1: every closed bounded subset of M is compact. We will seesufficient conditions for the existence of the Frechet mean of Q (as a uniqueminimizer of the Frechet function F of Q) in the subsequent sections.

Now we deduce the asymptotic distribution of the sample mean afterproper scaling and translation in case there is a population mean. This re-sult can be used to construct asymptotic confidence set for the populationmean based on the sample analogue and for non-parametric testing. We shallsee these applications in the Planer Shape Space.

Theorem 1.3 (Asymptotic distribution of Frechet Sample mean)Suppose the following assumptions hold:

A1 Q has support in a single coordinate patch, (U, φ), φ : U −→ Rd smooth.

Let Yj = φ(Xj); j = 1, . . . , n. Qφ = Q φ−1.

A2 Frechet Mean µF of Q is unique.

A3 ∀x, y 7→ h(x, y) = (dφ)2(x, y) = d2(φ−1x, φ−1y) is twice continuouslydifferentiable in a neighborhood of φ(µF ) = µ.

3

A4 E(Drh(Y, µ))2 < ∞ ∀r.

A5 E sup|u−v|≤ǫ

|DsDrh(Y1, v) − DsDrh(Y1, u)| → 0 as ǫ → 0 ∀ r, s.

A6 Λ = (( EDsDrh(Y1, µ) )) is nonsingular.

A7 Σ = Cov Dh(Y1, µ) is nonsingular.Let µF,n be a measurable selection from the Frechet sample mean set, µn =φ(µF,n). Then under the assumptions A1-A7,

√n(µn − µ)

L−→ N(0,Λ−1Σ(Λ′

)−1)

Proof Let F (y) =∫

dφ2(x, y)Qφ(dx). Its minimizer is µ.

Similarly define Fn(y) =∫

dφ2(x, y)Qφ

n(dx) = 1n

∑nj=1 dφ2

(Yj, y) (Qφn = 1

n

∑nj=1 δYj

).Its minimizer is µn. So,

0 =1√n

n∑

j=1

Drh(Yj, µn)

=1√n

n∑

j=1

Drh(Yj, µ)

+d∑

s=1

√n(µn − µ)(s) 1

n

n∑

j=1

DsDrh(Yj , µ) +d∑

s=1

√n(µn − µ)(s)(ǫn)rs, 1 ≤ r ≤ d

(1.4)

where (ǫn)rs =1

n

n∑

j=1

[DsDrh(Yj , θn) − DsDrh(Yj, µ)]

for some θn lying on the line segment joining µ and µn.

⇒[((

1

n

n∑

j=1

DsDrh(Yj , µ) + ǫn

))]

√n(µn − µ) = − 1√

n

n∑

j=1

Dh(Yj, µ)

⇒√

n(µn − µ) = −Λ−1

(

1√n

n∑

j=1

Dh(Yj , µ)

)

+ oP (1) (1.5)

⇒√

n(µn − µ)L−→ −Λ−1N(0,Σ) = N(0,Λ−1Σ(Λ′)−1) (1.6)

4

1.2 Frechet Variation

Definition 1.2 The Frechet Variation, V of Q is the minimum value at-tained by the Frechet function F on M . Similarly the minimum value at-tained by the sample Frechet function,

Fn(p) =1

n

n∑

i=1

d2(Xi, p)

is called the sample Frechet Variation.

From Proposition 1.1., it follows that if the Frechet function is finite forsome p, then the variation is finite and is attained by all p in the Frechet meanset. Similarly the sample variation is the value of Fn on the sample mean set.

Theorem 1.4 (Consistency of the Sample Variation)Suppose everyclosed and bounded subset of M is compact, and F is finite on M . Then thesample Frechet variation is a strongly consistent estimator of the populationvariation.Proof Let θn be a measurable selection from the sample Frechet mean set.Then the sample variation Vn = Fn(θn).Fix ǫ > 0.By Theorem 1.2, there exists a P-null set A and for all ω ∈ Ac, there ex-ists N(ω) < ∞, such that for all n ≥ N(ω), θn(ω) is contained in theǫ−neighborhood of the Frechet mean set of Q.From Proposition 1.1, the Frechet mean set is compact. So we may assumethat ∀ n ≥ N(ω), θn(ω) ∈ K, where K is a compact set containing theFrechet mean set of Q. Choose θn in the Frechet mean set. Then

|Vn − V | =

∣

∣

∣

∣

∣

1

n

n∑

j=1

d2(Xj, θn) − Ed2(X1, θn)

∣

∣

∣

∣

∣

≤∣

∣

∣

∣

∣

1

n

n∑

j=1

(d2(Xj, θn) − d2(Xj, θn))

∣

∣

∣

∣

∣

+

∣

∣

∣

∣

∣

1

n

n∑

j=1

d2(Xj, θn) − Ed2(X1, θn)

∣

∣

∣

∣

∣

≤ 1

n

n∑

j=1

|d2(Xj, θn) − d2(Xj, θn)| + supθ∈K

|1n

n∑

j=1

d2(Xj, θ) − Ed2(X1, θ)|

(1.7)

5

By Theorem 2.3[1], the second term on the RHS of (1.7) goes to 0. Letsassume that it is less than ǫ, for all n ≥ N(ω).The first term on RHS of (1.7) is

1

n

n∑

j=1

|d2(Xj, θn) − d2(Xj, θn)| ≤ 1

n

n∑

j=1

[d(Xj, θn) + d(Xj, θn)]d(θn, θn)

≤ 2

n

n∑

j=1

supθ∈K

d(Xj, θ)d(θn, θn)

supθ∈K

d(X1, θ) ≤ d(X1, θ0) + supθ∈K

d(θ0, θ) for any θ0 in K.

So E supθ∈K

d(X1, θ) ≤ E d(X1, θ0) + supθ∈K

d(θ0, θ) < ∞

So 2n

∑nj=1 sup

θ∈Kd(Xj, θ) is bounded almost surely, for simplicity lets assume

that it is less than 1 for all ω in Ac and n ≥ N(ω).Choose the sequence θn such that d(θn(ω), θn) < ǫ for ω ∈ Ac and n > N(ω).This is possible by Theorem 1.2. Then the first term on RHS of (1.7) is lessthan ǫ.So |Vn(ω)− V | < 2ǫ for all ω ∈ Ac and n > N(ω). Since P (Ac) = 1, and ǫ isarbitrary, this proves that Vn

a.s.−→ V .

Remark 1.2 In the above theorem, we did not need the Frechet mean toexist. The sample variation is a consistent estimator of the true variationeven when the Frechet Function of Q does not have a unique minimizer.

Next we deduce the asymptotic distribution of the sample variation whenthere is a unique population mean. This result fails if we do not have aunique mean.Theorem 1.5 Assume the same set-up as in Theorem 1.3. Then underassumptions A1-A7,

√n(Fn(µn) − F (µ))

L−→ N(0, V d2(X1, µF ))

6

Proof√

n(Fn(µn) − F (µ)) =√

n(Fn(µn) − Fn(µ)) +√

n(Fn(µ) − F (µ)) (1.8)

√n(Fn(µn) − Fn(µ)) =

1√n

n∑

i=1

d∑

r=1

(µn − µ)rDrh(Yi, µ)

+1

2√

n

n∑

i=1

d∑

r=1

d∑

s=1

(µn − µ)r(µn − µ)sDsDrh(Yi, µ∗n)

(1.9)

for some µ∗n in the line segment joining µ and µn.

1

n

n∑

i=1

DsDrh(Yi, µ∗n) =

1

n

n∑

i=1

DsDrh(Yi, µ) +1

n

n∑

i=1

(DsDrh(Yi, µ∗n) − DsDrh(Yi, µ))

(1.10)

Under Assumption (A6), the first term on RHS of (1.10) converges in proba-bility to Λsr while the second term is oP (1) (converges to zero in probability)by (A5). So (1.10) is bounded in probability. The second term on the RHSof (1.9) is

1

2

d∑

r=1

d∑

s=1

√n(µn − µ)r(µn − µ)s

1

n

n∑

i=1

DsDrh(Yi, µ∗n) (1.11)

By Theorem 1.3,√

n(µn − µ) is asymptotically normal, so (1.11) is oP (1).The first term on the RHS of (1.9) is

〈√

n(µn − µ),1

n

n∑

i=1

Dh(Yi, µ)〉 (1.10)

1n

∑ni=1 Dh(Yi, µ) → EDh(Y1, µ) = 0 and since

√n(µn −µ) is asymptotically

normal, so (1.10) is oP (1).This proves that (1.9) is oP (1). Hence (1.8) becomes

√n(Fn(µn) − F (µ)) =

√n(Fn(µ) − F (µ)) + oP (1)

Fn(µ) − F (µ) =1

n

n∑

i=1

(

d2(Xi, µF ) − Ed2(X1, µF ))

⇒√

n(Fn(µn) − F (µ)) =1√n

n∑

i=1

(

d2(Xi, µF ) − Ed2(X1, µF ))

(1.11)

7

By CLT on the iid sequence d2(Xj, µF ), (1.11) converges in law to N(0, V d2(X1, µF )).

From now on let (M,g) be a d-dimensional connected complete Rieman-nian manifold, g being the riemannian metric on M . We shall come acrossdifferent notions of means and variations depending on the distance chosenon M . First we start with the ’Extrinsic distance’.

2 Extrinsic Mean and Variation

Let φ : M → Rk be an isometric map of M onto M = φ(M) ⊂ Rk. Definethe metric on M as: d(x, y) = ‖φ(x) − φ(y)‖, where ‖.‖ denotes Euclideannorm (‖u‖2 =

∑ki=1 ui

2 u = (u1, u2, .., uk)).

Assume M is a closed subset of Rk. Then for every u ∈ Rk there exists acompact set of points in M whose distance from u is the smallest among allpoints in M . We will denote this set by PMu = x ∈ M : ‖x− u‖ ≤ ‖y − u‖∀y ∈ M. If this set is a singleton, u is said to be a non-focal point of Rk

(wrt M); otherwise it is said to be a focal point of Rk.

Definition 2.1 Let (M,d), φ be as above. Let Q be a probability measureon M s.t. the Frechet function F (x) =

∫

d2(x, y)Q(dy) < ∞ (∀x). TheFrechet mean (set) of Q is called the Extrinsic Mean(set) of Q, and theFrechet variation of Q is called its Extrinsic Variation.If Xi (i = 1, . . . , n) are iid observations from Q, and Qn = 1

n

∑ni=1 δXi

isthe empirical distribution, then the Frechet mean(set) of Qn is called theExtrinsic Sample mean(set) and the Frechet variation of Qn is called theExtrinsic Sample Variation.

Let Q and Qn be the images of Q and Qn respectively on Rk: Q = Qφ−1,Qn = Qnφ−1. The next result gives us a way to calculate the extrinsic meanand establishes its consistency of the sample mean as an estimator of the pop-ulation mean if that exists.

Theorem 2.1 (Consistency of the Extrinsic Sample Mean) Assumethe same set-up as in Definition 2.1. (a) If µ =

∫

Rk uQ(du) is the mean

8

of Q, then the extrinsic mean set of Q is given by φ−1(P µ

M). (b) If µ is a

nonfocal point of Rk (relative to M), then the extrinsic sample mean µn

(any measurable selection from the extrinsic mean set of Qn) is a stronglyconsistent estimator of the extrinsic mean µ = φ−1(P µ

M).

Proof (a) In view of the isometry φ : M → M , it is enough to prove that theFrechet function F (u) =

∫

M‖u− v‖2Q(dv) (on M) is minimum iff u ∈ PM µ.

Now for arbitrary u, v ∈ M , one has

‖u − v‖2 = ‖u − µ‖2 + ‖µ − v‖2 + 2(u − u) · (µ − v)

where · denotes Euclidean inner product. Integrate both sides wrt Q(dv) toget

F (u) = ‖u − µ‖2 +

∫

M

‖µ − v‖2Q(dv) (2.1)

since the integral of the product term equals 2(u − µ) · (µ − µ) = 0. Min-imization of the RHS of (2.1) is achieved precisely for u ∈ PM µ since thesecond term on the right does not involve u.

(b) Since M is closed, closed bounded subsets of M are compact. Thereforeconsistency of the sample mean Qn follows from Theorem 1.2, if one notesthat PM µ is the unique minimizer of F , since µ is non-focal.

Now we see examples of some Riemannian manifolds where we computethe extrinsic mean and variation.

2.1 Example1: Unit sphere Sk−1

Consider the inclusion map i : Sk−1 → Rk, i(x) = x. The extrinsic meanset of a probability measure Q on Sk−1 is then the point(set) PSk−1µ onSk−1 closest to µ =

∫

Rk xQ(dx), where Q is Q regarded as a probability

measure on Rk. Note that µ is non-focal iff µ 6= 0. Then PSk−1µ = µ‖µ‖ , else

PSk−1(0) = Sk−1.From (2.1), the extrinsic variation of Q is:

V =

∫

Rk

‖x − µ‖2Q(dx) + (‖µ‖ − 1)2

= 2(1 − ‖µ‖)

9

So that ‖µ‖ = 1 iff Q is degenerate at a point.

2.2 Example2: Axial Space RP k−1

Consider the real projective space RP k−1 of all lines (λx : λ ∈ R \ 0)through the origin in R

k, x 6= 0. Each such line is specified by its points of in-tersection with the unit sphere Sk−1. In other words, RP k−1 may be regardedas the quotient space of Sk−1 under the equivalence relation u ∼ v iff u = −v.The elements of RP k−1 may be represented as [u] = −u, u(u ∈ Sk−1).Another representation of RP k−1 is via the Veronese-Whitney embed-ding φ of RP k−1 into the space of all k × k matrices identified with Rk2

by arranging the k2 elements of a matrix as a k2-dimensional vector, with φgiven by

φ([u]) = uu′ = ((uiuj))1≤i,j≤k (u = (u1, .., uk)′ ∈ Sk−1) (2.2)

As Rk2, the space of k × k matrices has the Euclidean distance

‖A − B‖2 ≡∑

1≤i,j≤k

(aij − bij)2 = Trace(A − B)(A − B)′ (2.3)

If we now define the extrinsic distance d on RP k−1 as

d2([u], [v]) = ‖uu′ − vv′‖2 = Trace(uu′ − vv′)2 (2.4)

then (RP k−1, d) satisfy the hypothesis of Proposition 1.1. Also φ(RP k−1) isclosed.Let Q be a probability measure on RP k−1, and let µ be the mean of Q =Q φ−1 considered as a probability measure on Rk2

. Since µ is a mixtureof elements of the form uu′, µ ∈ S+(k, R): the space of all symmetric non-negative definite k × k matrices with real elements. We need to identify theset of all nonfocal µ in S+(k, R). Let then µ ∈ S+(k, R). There exists anorthogonal k × k matrix T such that T µT ′ = D ≡ Diag(λ1, . . . , λk) wherethe eigen values may be taken to be ordered: 0 ≤ λ1 ≤ . . . ≤ λk. To findPM µ with M = φ(RP k−1) ≡ set of all matrices of the form (2.2), note firstthat, writing v = Tu

‖µ − uu′‖2 ≡ Trace(µ − uu′)(µ − uu′)

= Trace(T (µ − uu′)T ′)(T (µ − uu′)T ′) = Trace(D − vv′)2 (2.5)

10

Hence

‖µ − uu′‖2 =i=k∑

i=1

(λi − v2i )

2 +∑

j 6=j′

(vivj)2

=k∑

i=1

λ2i +

k∑

i=1

v4i − 2

k∑

i=1

λiv2i + (

∑

j

v2j )(∑

j′

v2j′) −

k∑

j=1

v4j

=k∑

i=1

λ2i − 2

k∑

i=1

λiv2i + 1 (2.6)

The minimum is achieved if v = (0, 0, . . . , 0, 1) = ek (since λk is thelargest eigenvalue of µ). Since µ ≡ T ′v = T ′ek is the eigenvector of µ havingthe eigenvalue λk, the minimum distance between µ and M is attained by[µµ′] where µ is a unit vector in the eigenspace of the largest eigenvalue of µ.Thus µ is nonfocal iff its largest eigenvalue is simple, i.e., if the eigenspacecorresponding to the largest eigenvalue is one dimensional. In that case theextrinsic mean of Q is [µ]. Therefore, we have the following consequence ofTheorem 2.1.

Proposition 2.2 Assume that the largest eigenvalue of µ =∫

uu′Q(d[u]) is

simple. Let µn denote a eigenvector of 1n

∑ni=1 XiXi

′ (where Xi, 1 ≤ i ≤ n,are iid such that [X1] has distribution Q), having the largest eigenvalue.Then [µn] is a strongly consistent estimator of the Extrinsic mean [µ].

Also from (2.1) and (2.6), we get the Extrinsic variation of Q to be

V = E‖X1X′1 − µµ′‖2

= E‖X1X′1 − µ‖2 +

k∑

i=1

λ2i − 2λk + 1

= E Trace(X1X′1 − µ)2 +

k∑

i=1

λ2i − 2λk + 1

= 1 − Trace(µ2) +k∑

i=1

λ2i − 2λk + 1

= 2(1 − λk)

11

2.3 Example 3: Planer Shape Space of k-ads

Consider a set of k points on the plane, e.g., k locations on a skull projectedon a plane, not all points being the same. We will assume k > 2 and refer tosuch a set as a k-ad (or a set of k landmarks).For convenience we will denote a k-ad by k complex numbers (zj = xj +iyj, 1 ≤ j ≤ k), i.e., we will represent k-ads on a complex plane.By the shape of a k-ad z = (z1, z2, . . . , zk), we mean the equivalence class, ororbit of z under translation, rotation and scaling.To remove translation, one may substract 〈z〉 ≡ (〈z〉, 〈z〉, . . . , 〈z〉) (〈z〉 =1k

∑kj=1 zj) from z to get z − 〈z〉.

Rotation of the k-ad by an angle θ and scaling (by a factor r > 0) are achievedby multiplying z − 〈z〉 by the complex number λ = r exp iθ.Hence one may represent the shape of the k-ad as the complex line passingthrough z − 〈z〉, namely, λ(z − 〈z〉) : λ ∈ C \ 0.Thus the space of k-ads is the set of all complex lines on the (complex(k-1)-dimensional) hyperplane, Hk−1 = w ∈ Ck \ 0 :

∑k1 wj = 0.

So the shape space Σk2 of planer k-ads has the structure of the complex

projective space CP k−2: the space of all complex lines through the origin inCk−1. As in the case of CP k−2, it is convenient to represent the element of Σk

2

corresponding to a k-ad z by the curve γ(z) = [z] = eiθ (z−〈z〉)‖z−〈z〉‖ : 0 ≤ θ < 2π

on the unit sphere in Hk−1 ≈ Ck−1.Denote by u the quantity (z−〈z〉)

‖z−〈z〉‖ . That is called the preshape of the shape

of z. Then the Veronese-Whitney embedding of Σk2 is given by

φ : Σk2 → C

k2

,

φ([z]) = uu∗ (u = (u1, . . . , uk)′ ∈ Hk−1, ‖u‖ = 1)

= ((uiuj))1≤i,j≤k (2.7)

The shape of z, [z] = eiθu : 0 ≤ θ < 2π is the orbit of the vector u

under rotation. Note that if v1, v2 ∈ [z], then φ([v1]) = φ([v2]) = φ( (z−〈z〉)‖z−〈z〉‖).

Define the extrinsic distance d on Σk2 by that induced from this embedding,

namely,

d2([z], [w]) = ‖uu∗ − vv∗‖2 , u.=

z − 〈z〉‖z − 〈z〉‖ , v

.=

w − 〈w〉‖w − 〈w〉‖ (2.8)

12

where for arbitrary k × k complex matrices A, B

‖A − B‖2 =∑

j,j′

|ajj′ − bjj′‖2 = Trace(A − B)(A − B)∗ (2.9)

is just the squared euclidean distance between A and B regarded as elementsof Ck2

(or, R2k2).

Since the matrices uu∗, vv∗ in (2.8) are Hermitian, one notes that the imageφ(Σk

2) of Σk2 is a closed subset of Ck2

and the “conjugate-transpose” symbol* may be dropped from (2.9) in computing distances in φ(Σk

2).Let Q be a probability measure on the shape space Σk

2, and let µ0 denote themean vector of Q0

.= Q φ−1, regarded as a probability measure on Ck2

(or,R2k2

). Note that µ0 belongs to the convex hull of φ(Σk2) and in particular,

is an element of Hk−1. Let T be a (complex) orthogonal k × k matrix suchthat Tµ0T

∗ = D = Diag(λ1, λ2, . . . , λk), where λ1 ≤ λ2 ≤ . . . ≤ λk are theeigenvalues of µ0. Then, writing v = Tu with u as in (2.8),

‖uu∗ − µ0‖2 = ‖vv∗ − D‖2 =k∑

j=1

(|vj|2 − λj)2 +

∑

j 6=j′

|vjvj′|2

=∑

λj2 +

k∑

j=1

|vj|4 − 2k∑

j=1

λj|vj|2 +k∑

j=1

|vj |2.k∑

j′=1

|vj′ |2 −k∑

j=1

|vj|4

=∑

λj2 + 1 − 2

k∑

j=1

λj|vj |2

which is minimized (on φ(Σk2)) by taking v = ek = (0, . . . , 0, 1)′, i.e., u =

T ∗ek - a unit eigenvector having the largest eigenvalue λk of µ0. It followsthat the Extrinsic mean µ, say, of Q is unique iff the eigenspace for thelargest eigenvalue of µ is (complex) one dimensional, and then µ = [w],w( 6= 0) ∈ the eigenspace of the largest eigenvalue of µ0. It follows fromTheorem 2.1 that any measurable selection from the sample extrinsic meanset is a consistent estimator of µ iff the largest eigenvalue of µ0 is simple, i.e.it has an eigenspace of complex dimension one.By similar analysis as in the Real Projective space, one can show that theExtrinsic Variation of Q has the expression

V = 2(1 − λk)

13

The distance d on Σk2 (see (2.8)) may be expressed as

d2([z], [w]) ≡ ‖uu∗ − vv∗‖2 =∑

j,j′

|ujuj′ − vjvj′|2

=∑

j,j′

|ujuj′|2 +∑

j,j′

|vjvj′|2 − 2∑

j,j′

|ujuj′vjvj′|

=∑

j

|uj|2∑

j′

|uj′|2 +∑

j

|vj |2∑

j′

|vj′|2 − 2(u∗v)(v∗u)

= 2 − 2(u∗v)(v∗v)

= 2(1 − |u∗v|2) (2.10)

with u and v as in (2.8). This is the so-called Full Procrustes distancefor Σk

2 (See Dryden and Mardia [6]).Let [z] and [w] be two shapes and let u and v be their preshapes. Then theProcrustes coordinates of v onto u is defined as

vP = (a + ib)1k + βeiθv.

where (β, θ, a, b) are chosen to minimize

D2(u, v) = ‖u − βeiθv − (a + ib)1k‖2

Then one gets

a = b = 0

βeiθ = v∗u

So vP = (v∗u)v

and D2(u, v) = (1 − |v∗u|2) =1

2d2([z], [w])

Here 1k denotes the vector of ones of length k, that is 1k = (1, 1, . . . , 1)′

k.

As a numerical example, consider 8 locations on a gorilla skull projectedon a plane. There are 30 female and 29 male samples. [Source: StatisticalShape Analysis - Dryden & Mardia]. Figures 1 and 2 are the plot of theProcrustes coordinates of the female and male samples onto their extrinsicsample means respectively.

14

3 Asymptotic Distribution of the Extrinsic

Sample Mean

One can use Theorem 1.3 to get the asymptotic distribution of the samplemean. However expressions for the parameters Λ and Σ are not easy to get.Here we devise another way to deduce the asymptotic distribution.We are in the same set up as in the start of Section 2. φ is an embedding ofM into Rk. The mean µ of the image Q = Qφ−1 is a non-focal point of Rk,so that the projection P (µ) of µ on φ(M) is unique, and the extrinsic meanof Q is µE = φ−1P (µ).

Let Y = 1n

∑nj=1 Yj denote the sample mean of Yj = φ(Xj), where

X1, . . . , Xn is a random sample from Q. The extrinsic sample mean is φ−1(P (Y )),where P (Y ) is the projection of Y on φ(M). In a neighborhood of a nonfocalpoint such as µ, P (.) is smooth. Write

√n[P (Y ) − P (µ)] =

√n(dµP )(Y − µ) + oP (1) = (dµP )(

√n(Y − µ)) + oP (1)

(3.1)where dµP is the differential (map) of the projection P (.), which takes vectorsin the tangent space of R

k at µ to tangent vectors of φ(M) at P (µ). Letf1, f2, . . . , fd be an orthonormal basis of TP (µ)φ(M) and e1, e2, . . . , ek be anorthonormal basis (frame) of Rk. One has

√n(Y − µ) =

k∑

j=1

√n(Y − µ)jej,

dµP (√

n(Y − µ)) =k∑

j=1

√n(Y − µ)jdµP (ej)

=k∑

j=1

√n(Y − µ)j

d∑

r=1

ajr(µ)fr

=d∑

r=1

[k∑

j=1

ajr(µ)√

n(Y − µ)j]fr (3.2)

where ajr(µ) = dµP (ej) · fr. Hence,√

n[P (Y ) − P (µ)] has an asymptoticGaussian distribution on the tangent space of φ(M) at P (µ), with mean

15

vector zero and a dispersion matrix (wrt the basis vector fr : 1 ≤ r ≤ d)Σ = A′V A [A = A(µ) = ((ajr(µ)))1≤j≤k,1≤r≤d] (3.3)

V being the k × k covariance matrix of Q (wrt the basis ej : 1 ≤ j ≤ k).In matrix notation,

√nA′(Y − µ)

L−→ N(0,Σ) as n → ∞ (3.4)

This implies, writing X 2d for the chisquare distribution with d degrees of

freedom,

n(Y − µ)′AΣ−1A′(Y − µ) −→ X 2d ,

n(Y − µ)′AΣ−1A′(Y − µ) −→ X 2d as n → ∞ (3.5)

Here

Σ = (A(µ))′V A(µ),

V = ((1

n

n∑

i=1

(Yij − µj)(Yij′ − µj′)))kj,j′=1 (3.6)

A confidence region for P (µ) with asymptotic confidence level 1 − α is thengiven by

P (µ) : n(Y − µ)′AΣ−1A′(Y − µ) ≤ X 2d (1 − α) (3.7)

Note that A = A(µ) depends on µ. The bootstrapped version of the statisticin (3.7) is

U ∗ = n(Y ∗ − Y )′A(Y )(A(Y )′V ∗A(Y ))−1A′(Y )(Y ∗ − Y ),

V ∗ = ((1

n

n∑

i=1

(Y ∗ij − Yj)(Y

∗ij′ − Yj′)))

kj,j′=1 (3.8)

The corresponding bootstrapped confidence region is given by

P (µ) : n(Y − µ)′AΣ−1A′(Y − µ) ≤ c∗(1−α) (3.9)

where c∗(1−α) is the upper (1−α)-quantile of the bootstrapped values U ∗. An

alternative to (2.17), simpler to compute, is given by

P (µ) : n(Y − µ)′A(Y )ˆΣ−1A(Y )′(Y − µ) ≤ X 2

d (1 − α),ˆΣ = A′(Y )V A(Y ) (3.10)

But then the corresponding bootstrapped version becomes even more com-putation intensive than (3.8).

16

3.1 Asymptotic distribution of the mean shape

As an application, let us find the asymptotic distribution of the sample ex-trinsic mean shape of a sample of size n from the planer shape space.M = Σk

2 can be embedded into S(k, C), the space of all k × k complex selfadjoint matrices, via the map φ in (2.7).We consider S(k, C) as a linear subspace of Ck2

(over R) and as such a reg-ular submanifold of Ck2

embedded by the inclusion map, and inheriting themetric:

〈A,B〉 = Re Trace(AB′)

The dimension of S(k, C) is k2. An orthonormal basis for S(k, C) is given byva

b : 1 ≤ a ≤ b ≤ k and wab : 1 ≤ a < b ≤ k :

vab =

1√2(eae

tb + ebe

ta), a < b

= eaeta, a = b

wab =

i√2(eae

tb − ebe

ta), a < b.

where ea : 1 ≤ a ≤ k is the standard canonical basis for Rk.

We also take vab : 1 ≤ a ≤ b ≤ k and wa

b : 1 ≤ a < b ≤ k as theorthogonal frame for TS(k, C) ≡ S(k, C). Note that for all U ∈ O(k)(UU ∗ = U ∗U = I), Uva

b U∗ : 1 ≤ a ≤ b ≤ k, Uwa

bU∗ : 1 ≤ a < b ≤ k is

also an orthogonal frame for S(k, C).

Assume that the mean µ of Q = Qφ−1 has its largest eigen value simple.Then from (3.1), one has

√n[P (Y ) − P (µ)] = dµP (

√n(Y − µ)) + oP (1) (3.11)

Here we view dµP : S(k, C) → TP (µ)φ(Σk2). Choose U ∈ O(k) such that

U ∗µU = D ≡ Diag(λ1, . . . , λk), λ1 ≤ . . . ≤ λk−1 < λk being the eigenvaluesof µ.

17

Choose a basis Uvab U

∗, Uwab U

∗ for S(k, C). Then one can show that

dµP (Uvab U

∗) =

0 if 1 ≤ a ≤ b < k, a = b = k,

(λk − λa)−1Uva

kU∗, 1 ≤ a < k, b = k.

and

dµP (UwabU

∗) =

0 if 1 ≤ a < b < k

(λk − λa)−1Uwa

kU∗, 1 ≤ a < k, b = k.

(3.12)

Write√

n(Y − µ) =∑∑

1≤a≤b≤k

<√

n(Y − µ), Uvab U

∗ > Uvab U

∗

+∑∑

1≤a<b≤k

<√

n(Y − µ), Uwab U

∗ > UwabU

∗ (3.13)

Since Y 1k = µ1k = 0, λ1 = 0 and U(:, 1) = α1k, |α| = 1√k. It is easy to

check that 〈√n(Y − µ), Uv1bU

∗〉 = 〈√n(Y − µ), Uw1bU

∗〉 = 0.So from (3.12) and (3.13),

dµP (√

n(Y − µ))

=k−1∑

a=2

<√

n(Y − µ), UvakU

∗ > (λk − λa)−1Uva

kU∗

+k−1∑

a=2

<√

n(Y − µ), UwakU

∗ > (λk − λa)−1Uwa

kU∗ (3.14)

From (3.11) and (3.14), we see that√

n(P (Y )−P (µ)) has an asymptoticGaussian distribution on a subspace of S(k, C) with asymptotic coordinatesTn(µ) = (〈√n(Y − µ), Uva

kU∗〉k−1

a=2, 〈√

n(Y − µ), UwakU

∗〉k−1a=2) wrt the basis

vector (λk − λa)−1Uva

kU∗, (λk − λa)

−1UwakU

∗k−1a=2.

ThenTn(µ)′Σ(µ)−1Tn(µ) −→ X 2

2k−4

Write U = U(:, 2 : k − 1). Then

Tn(µ) =1√n

n∑

j=1

Tj(µ) where

Tj(µ)′ =√

2(Re(U ∗YjU(:, k))′, Im(U ∗YjU(:, k))′)

Σ(µ) = ETj(µ)Tj(µ)′

18

One can estimate Σ(µ) by Σ(µ): the sample covariance matrix, or Σ(Y ): thesample covariance matrix, with U replaced by the eigen vectors of Y , as in(3.6) and (3.10).This will give confidence sets for P (µ) as is (3.7) and (3.10).

3.2 A two sample testing problem

Let Q1 and Q2 be two probability measures on the shape space Σk2, and let

µ1 and µ2 denote the mean vectors of Q1 φ−1 and Q2 φ−1 respectively.Suppose [x1], . . . , [xn] and [y1], . . . , [ym] are iid random samples from Q1 andQ2 respectively. Let Xi = φ([xi]), Yi = φ([yi]) be their images onto φ(Σk

2)which are random samples from Q1 φ−1 and Q2 φ−1 respectively. Supposewe are to test if the extrinsic means of Q1 and Q2 are equal, i.e.

H0 : Pµ1 = Pµ2

We assume that both µ1 and µ2 have simple largest eigen values. Then underH0, the corresponding eigen vectors differ by a rotation.Choose µ ∈ S(k, C) with same projection as µ1 and µ2. Suppose µ =UΛU ∗, where Λ = Diag(λ1 ≤ λ2 ≤ . . . < λk) are its eigen values andU = [U1, U2, . . . , Uk] are the corresponding eigen vectors. Under H0, Pµ1 =Pµ2 = UkU

∗k .

From Section 3.1,

dµP (X − µ) =k−1∑

a=2

√2Re(U ∗

aXUk)(λk − λa)−1Uva

kU∗ +

k−1∑

a=2

√2Im(U ∗

aXUk)(λk − λa)−1Uwa

kU∗

=k−1∑

a=2

(λk − λa)−1(U ∗

aXUk)UaU∗k +

k−1∑

a=2

(λk − λa)−1(U ∗

k XUa)UkU∗a

(3.15)

19

and

dµP (Y − µ) =k−1∑

a=2

√2Re(U ∗

a Y Uk)(λk − λa)−1Uva

kU∗ +

k−1∑

a=2

√2Im(U ∗

a Y Uk)(λk − λa)−1Uwa

kU∗

=k−1∑

a=2

(λk − λa)−1(U ∗

a Y Uk)UaU∗k +

k−1∑

a=2

(λk − λa)−1(U ∗

k Y Ua)UkU∗a

(3.16)

Let Tn(µ) = (Re(U ∗aXUk), Im(U ∗

aXUk))k−1a=2 and

Sm(µ) = (Re(U ∗a Y Uk), Im(U ∗

a Y Uk))k−1a=2

Then under H0, Tn(µ) and Sm(µ) have mean zero, and as n,m → ∞,

√nTn(µ)

L−→ N(0,Σ1(µ))

and√

mSm(µ)L−→ N(0,Σ2(µ))

Suppose nm+n

→ p, mm+n

→ q, for some p, q > 0; p + q = 1. Then

√n + m(Tn(µ) − Sm(µ)) =

√n + m(Re(U ∗

a (X − Y )Uk), Im(U ∗a (X − Y )Uk)

k−1a=2

L−→ N2k−4(0,1

pΣ1 +

1

qΣ2)

So (n + m)(Tn(µ) − Sm(µ))′

(1

pΣ1(µ) +

1

qΣ2(µ))−1(Tn(µ) − Sm(µ))

L−→ X 22k−4

(3.17)

We can choose µ to be any positive linear combination of µ1 and µ2. Thenunder H0, µ will have same projection on φ(Σk

2) as µ1 and µ2. We may takeµ = pµ1 + qµ2.

In practice, since µ1 and µ2 are unknown, so is µ. Then we may estimateµ by the pooled sample mean, µ = nX+mY

m+n; Σ1(µ) and Σ2(µ) by their sample

estimates Σ1(µ) and Σ2(µ), where

20

Σ1(µ) =1

nX(µ)X(µ)

′ − X(µ)X(µ)′

X(µ)ij =

Re(U ∗i+1XjUk) if 1 ≤ i ≤ k − 2, 1 ≤ j ≤ n

Im(U ∗i−k+3XjUk) if k − 1 ≤ i ≤ 2k − 4, 1 ≤ j ≤ n

X(µ) =1

n

n∑

j=1

X(µ).j

and

Σ2(µ) =1

mY (µ)Y (µ)

′ − Y (µ)Y (µ)′

Y (µ)ij =

Re(U ∗i+1YjUk) if 1 ≤ i ≤ k − 2, 1 ≤ j ≤ m

Im(U ∗i−k+3YjUk) if k − 1 ≤ i ≤ 2k − 4, 1 ≤ j ≤ m

Y (µ) =1

m

m∑

j=1

Y (µ).j

Then the two sample test statistic in (3.17) can be estimated by

(X(µ) − Y (µ))′

(1

nΣ1(µ) +

1

mΣ2(µ))−1(X(µ) − Y (µ)) (3.18)

For the skull data discussed in Section 2.3, suppose we are to test if the maleand female populations have the same mean shape.Figure 3 is the plot of the full Procrustes coordinates for the (sample) Ex-trinsic mean shapes of female and male skulls onto the Extrinsic mean forthe pooled sample.Value of the test statistic in (3.18) = 392.68.P-value for the test using chi square approximation =0.So we reject H0 and conclude that the mean shapes are different.

4 Asymptotic distribution of Extrinsic Vari-

ation

We are in the same set up as in the start of Section 2. Suppose V is theExtrinsic variation of Q, and Vn its sample analogue. Then from Theorem

21

1.5,

√n(Vn − V )

L−→ N(0, σ2) (4.1)

where σ2 =

∫

M

(d2(x, µE) − V )2Q(dx)

The above result can be used to construct a asymptotic level α confidenceinterval for the population variation which is given by :

(Vn − s√n

Z1−α2, Vn +

s√n

Z1−α2) (4.2)

where s2 = 1n

∑nj=1(d

2(Xj, µnE)−Vn)2 is the sample estimate of σ2, µnE is thesample extrinsic mean and Z1−α

2is the upper (1 − α

2) quantile for standard

normal distribution.

For the gorilla skull data, 95% C.I. for the variations of females and malesare:Females: (0.0031, 0.0046)Males: (0.0034, 0.0056)

4.1 Testing equality of Extrinsic Variations

Result (4.1) can used to construct a non parametric test for testing whethertwo populations on the shape space have the same spread.We are in the set up of Section 3.2. Let V1 and V2 denote the variations of Q1

and Q2 respectively, and V1n and V2m denote their sample analogues. Thenthe null hypothesis is

H0 : V1 = V2 = V

Under H0, from (4.1)

√n(V1n − V )

L−→ N(0, σ21) (4.3)

√m(V2m − V )

L−→ N(0, σ22) (4.4)

where σ21 =

∫

Σk2

(d2([u], [µ1E ]) − V )2Q1(d[u])

and σ22 =

∫

Σk2

(d2([u], [µ2E ]) − V )2Q2(d[u]).

22

Suppose nm+n

→ p, mm+n

→ q, for some p, q > 0; p + q = 1. Then from(4.3) and (4.4)

√n + m(V1n − V2m)

L−→ N(0,

(

σ21

p+

σ22

q

)

) (4.5)

So(V1n − V2m)√

s21

n+

s22

m

L−→ N(0, 1) (4.6)

where s21 = 1

n

∑nj=1(d

2([xj], [µnE ])− V1n)2 and s22 = 1

m

∑mj=1(d

2([yj ], [µmE ])−V2m)2 are the sample estimates of σ2

1 and σ22 respectively and [µnE] and [µmE ]

are the sample mean shapes.For the shape space, the test statistic in (4.6) becomes

Tnm = 2(λkm − λkn)√

s21

n+

s22

m

where λkn and λkm are the largest eigen values of X and Y respectively. TheP-value for the test is P = P (|Z| > |Tnm|) where Z ∼ N(0, 1). We acceptH0 for large values of P .

For the skull data, Tnm = −0.923.P = 0.356.So we accept H0, that is the two populations have same average spreadaround their respective means.

5 Intrinsic Mean and Variation

Let (M, g) be a d-dimensional connected complete Riemannian manifold, gbeing the riemannian metric on M . Let the distance d = dg be the geodesicdistance under g. Let Q be a probability distribution on M with finite Frechetfunction.

Definition 5.1 The Frechet mean set of Q under the distance dg is calledits Intrinsic Mean set. The Frechet Variation of Q under dg is called itsIntrinsic Variation.Let X1, X2, . . . , Xn be iid observations on M with common distribution Q.The sample Frechet mean set is called the sample Intrinsic Mean set and

23

the sample Frechet Variation is called the sample Intrinsic Variation.

Let us define a few technical terms related to Riemannian manifolds whichwe will use intensively in the subsequent sections. For more rigorous defini-tions see Lee: Riemannian Manifolds [8].

1. Geodesic These are curves γ with zero acceleration, ie γ = 0. They arelocally length minimizing curves.eg. Great Circles on Sn, Straight lines in Rn.

2. Exponential map For p ∈ M , V ∈ TpM ; we define exppV = γ(1),where γ is a geodesic with γ(0) = p and γ(0) = V .

3. Cut locus Let γ be a geodesic starting at p, γ(0) = p. Let t0 be thesupremum of all t for which γ is length minimizing on [0, t]. Then γ(t0) iscalled the cut point of p along γ. The cut locus of p, C(p) is the set of allcut points of p along all geodesics.eg. C(p) = −p on Sn.

4. Convex ball: A ball B is called convex if for any p, q ∈ B, the shortestgeodesic from p to q is unique in M and lies in B.e.g. Any ball of radius π/2 on Sn is convex.

5. Sectional Curvature: For a curve γ, its sectional curvature at γ(t) is±|γ(t)|, + if the curve is pointing towards N and − if it is pointing awayfrom N , where N is a chosen normal field along γ.For a 2d manifold, chose a basis (X,Y ) for TpM . Then the sectional curva-

ture at p is Rm(X,Y,Y,X)|X|2|Y |2−<X,Y >2 , where Rm is the ‘Riemann Curvature Tensor’.

For a d dimensional manifold, consider the 2 − D submanifold swept out bygeodesics with initial velocities lying in a 2 − D subspace, Π of TpM . Thenthe sectional curvature of that submanifold is the sectional curvature at passociated with Π, K(Π).

The next result gives a sufficient condition for the existence (uniqueness)of Intrinsic Mean.

Proposition 5.1 Suppose all sectional curvatures on M are bounded above

24

by some C ≥ 0. Suppose the support of Q is contained in a convex ball ofradius r, B(p, r) (wrt dg) where

r =

∞ if C = 0π

4√

Cif C > 0

Then the Frechet function, F of Q is strictly convex on B(p, r), has a uniqueglobal minima, which is attained in B(p, r) and is the unique local minima ofF on B(p, 2r). Hence the intrinsic mean of Q exists (as a unique minimizerof F ) and lies in B(p, r).Proof See Theorem 1.2, KARCHER [3] and Theorem 1, LE [4].

In case the population intrinsic mean exists, from Theorem 1.2, the sam-ple mean is a consistent estimator of the true mean. Now we deduce theasymptotic distribution of the sample mean after proper translation and scal-ing.

6 Asymptotic distribution of the sample In-

trinsic mean

One can use Theorem 1.3 to get the asymptotic distribution of the sampleintrinsic mean. For that we need to verify assumptions A1 to A7. The nextresult gives sufficient conditions for those assumptions to hold.

Proposition 6.1 Suppose the support of Q is contained in a convex geodesicball B(p, r) with center p and radius r, r as in Proposition 5.1; whichis disjoint from the cutlocus of p, C(p). Let φ = Exp−1

p : B(p, r) −→TpM(≈ Rd). Define h(x, y) = d2

g(φ−1x, φ−1y); x, y ∈ Rd. Let ((Drh))d

r=1

and ((DrDsh))dr,s=1 be the matrix of first and second order derivatives of

y 7→ h(x, y). Let Yj = φ(Xj); j = 1, . . . , n; X1, . . . , Xn being iid observationsfrom Q. Let µ = φ(µI), µI being the intrinsic mean of Q. Let µn = φ(µn,I),µn,I being a measurable selection from the sample mean set of Xj’s. DefineΛ = E((DrDsh(Y1, µ)))d

r,s=1; Σ = Cov((Drh(Y1, µ)))dr=1. Then Λ and Σ are

positive definite and

√n(µn − µ)

L−→ N(0,Λ−1ΣΛ−1)

25

Proof See Theorem 2.2, Bhattacharya, R. and Patrangenaru, V. [2].

Proposition 6.1 can be used to construct an asymptotic 1− α confidenceset for µI which is given by

µI : n(µn − µ)t(Λ−1ΣΛ−1)−1(µn − µ) ≤ X 2d (1 − α)

where X 2d (1−α) is the upper (1−α)th quantile of the chi-squared distribution

with d degrees of freedom and

Λ =1

n

n∑

i=1

DrDsh(Yi, µn)

Σ =1

n

n∑

i=1

Drh(Yi, µn)Dsh(Yi, µn)

are the sample estimates of Λ and Σ respectively.

Proposition 6.1 uses normal coordinates around some point p to get theasymptotic distribution of the sample mean. The natural candidate for p isthe intrinsic mean of Q, µI . Then we have expressions for the asymptoticparameters Λ and Σ, as the next result shows.

Theorem 6.2 Suppose Q has an intrinsic mean µI and satisfies the followingassumption:A. For any geodesic γ, γ(0) = µI ; there exists s(µI) > 0 such that the cut-locus of γ |[0,s(µI)], C(γ |[0,s(µI)]) has Q measure 0.This is satisfied in particular if the support of Q is contained in a convexopen ball of radius r as in Proposition 5.1.Let Yj = exp−1

µIXj = (Y 1

j , . . . , Y dj ) be the normal coordinates of the sample

around µI ; Σ and Λ as defined in Proposition 6.1. Then we have the followingexpressions:

1. Drh(Yj , µ) = −2Y rj

2. E(Drh(Y1, µ)) = 0

3. Σrs = 4Cov(Y r1 , Y s

1 )

26

If M has constant sectional curvature C, then

4. Λrs = 2E((

(

1 − f |Y1||Y1|2

)

Y r1 Y s

1 + (f |Y1|)δrs)),

|Y1| =√

(Y 11 )2 + (Y 2

1 )2 + . . . (Y d1 )2

where

f(x) =

1 if C = 0√Cx cos(

√Cx)

sin(√

Cx)if C > 0

√−Cx cosh(

√−Cx)

sinh(√−Cx)

if C < 0

(6.1)

Proof Let γ(s) be a geodesic, γ(0) = µI , s < s(µI). For m ∈ M \ C(γ(s)),define cs(t) = expm(texp−1

m γ(s)). For every s < s(µI), c(s, .) is a geodesicjoining m and γ(s), so that c(., .) is a variation of γ through geodesics. Alsosince m /∈ C(γ(s)), c(s, .) is the length minimizing curve joining m and γ(s).Let T = Dtc(s, t), S = Dsc(s, t). Then S(s, .) is a family of Jacobi fields alongc(s, .). Since c(s, 0) = m, S(s, 0) = 0, and since c(s, 1) = γ(s), S(s, 1) = γ(s).< T, T >= d(γ(s),m)2 is independent of t, DtT = 0. Then

F (γ(s)) =

∫

M\C(γ|[0,s(µI )])

d(γ(s),m)2Q(dm)

=

∫

〈T, T 〉Q(dm)

=

∫

(

∫ 1

0

〈T, T 〉dt)Q(dm) (6.2)

⇒ d

dsF (γ(s)) =

∫

(

∫ 1

0

2〈DsT, T 〉dt)Q(dm)

= 2

∫

(

∫ 1

0

d

dt〈T, S〉dt)Q(dm)

= 2

∫

〈T (s, 1), S(s, 1)〉Q(dm) (6.3)

S(s, 1) = γ(s) is independent of m. So

d

dsF (γ(s)) = 2〈

∫

M

T (s, 1)Q(dm), γ(s)〉 (6.4)

27

Since µI is a local minima for F ,∫

M

T (0, 1)Q(dm) = 0.

T (0, 1) = −exp−1µI

(m)

⇒∫

M

exp−1µI

(m)Q(dm) = 0 (6.5)

(6.4) and (6.5) prove 1 and 2. 3 follows from 1.To find Λ, we need to find the second order derivatives of F . Now

d2

ds2F (γ(s)) = 2

∫

< DsT (s, 1), S(s, 1)〉Q(dm)

= 2

∫

〈DtS(s, 1), S(s, 1)〉Q(dm) (6.6)

Let Js(t) = S(s, t). Then Js is a Jacobi field along c(s, .) with Js(0) = 0,Js(1) = γ(s) is independent of m ∈ M . Then

d2

ds2F (γ(s)) = 2

∫

〈DtJs(1), Js(1)〉Q(dm) (6.7)

Suppose M has constant sectional curvature C. Let J⊥s and J−

s be the normaland tangential components of Js. It can be shown that

〈J⊥s (1), DtJ

⊥s (1)〉 = f(|c(s, .)|)|J⊥

s (1)|2 (6.8)

, f as in (6.1).

Write J−s (t) = λtc(s, t) where λ = 〈Js,c(s,.)〉

|c(s,.)|2 is independent of t. Then

Dt(J−s )(1) = (DtJs)

−(1) = J−s (1) =

〈Js, c(s, 1)〉|c(s, 1)|2 c(s, 1) (6.9)

So Dt|J−s |2(1) = 2λ2|c(s, 1)|2 = 2

〈Js, c(s, 1)〉2|c(s, .)|2

= Dt〈Js, J−s 〉(1)

= 〈DtJs(1), J−s (1)〉 + |J−

s (1)|2 (6.10)

⇒ 〈DtJs(1), J−s (1)〉 = 2

〈Js(1), c(s, 1)〉2|c(s, 1)|2 − |J−

s (1)|2

=〈Js(1), c(s, 1)〉2

|c(s, 1)|2 (6.11)

28

So 〈DtJs(1), Js(1)〉 = 〈DtJs(1), J−s (1)〉 + 〈DtJs(1), J⊥

s (1)〉= 〈DtJs(1), J−

s (1)〉 + 〈Dt(J⊥s )(1), J⊥

s (1)〉

=< Js(1), c(s, 1) >2

|c(s, 1)|2 + f(|c(s, 1)|)|J⊥s (1)|2

=〈Js(1), c(s, 1)〉2

|c(s, 1)|2 + f(|c(s, 1)|)|Js(1)|2 − f(|c(s, 1)|)〈Js(1), c(s, 1)〉2|c(s, 1)|2

= f(|c(s, 1)|)|Js(1)|2 + (1 − f(|c(s, 1)|))〈Js(1), c(s, 1)〉2|c(s, 1)|2

= f(d(γ(s),m))|γ(s)|2 + (1 − f(d(γ(s),m))〈Js(1), c(s, 1)〉2

|c(s, 1)|2(6.12)

From (6.6) and (6.12), we have

d2

ds2F (γ(s)) = 2

∫

〈DtJs(1), Js(1)〉Q(dm)

=2

∫(

f(d(γ(s),m))|γ(s)|2 + (1 − f(d(γ(s),m))〈Js(1), c(s, 1)〉2

|c(s, 1)|2)

Q(dm)

(6.13)

Substituting s = 0 in (6.13), we get

d2

ds2F (γ(s))|s=0 = 2

∫(

f(|y|)|γ(0)|2 + (1 − f(|y|))〈γ(0), y〉2|y|2

)

Qφ(dy)

= 2|γ(0)|2Ef(|Y1|) + 2E

(

(1 − f(|Y1|))〈γ(0), Y1〉2

|Y1|2)

(6.14)

If we take γ(0) =∑

V i∂i, then we have

∑

V iV jΛij =d2

ds2F (γ(s))|s=0 (6.15)

So taking γ(0) = ∂i, we have

|γ(0)|2 = 1

Λii = 2Ef(|Y1|) + 2E

(

1 − f(|Y1|)|Y1|2

)

(Y i1 )2 (6.16)

29

Next taking γ(0) = ∂i + ∂j, i 6= j we get

|γ(0)|2 = 2

d2

ds2F (γ(s))|s=0 = 4Ef(|Y1|) + 2E

(

(1 − f(|Y1|))(Y i

1 + Y j1 )2

|Y1|2

)

= Λii + 2Λij + Λjj

⇒ Λij = 2E

(

1 − f |Y1||Y1|2

)

Y i1 Y j

1

This gives the expression for Λ and hence proves 4.

To construct asymptotic confidence set for the population mean µI , wecan consider the statistic

Tn = d2g(µnI , µI)

Taking φ = Exp−1µI

in Proposition 6.1, Tn = ‖µn‖2.Then from Proposition 6.1,

nTnL−→

d∑

i=1

λiZ2i (6.17)

where λ1 ≤ λ2 ≤ . . . ≤ λd are the eigen values of Λ−1ΣΛ−1 and Z1, . . . , Zd

iid N(0, 1).So an asymptotic level (1 − α) confidence set for µI can be given by:

µI : nd2g(µnI , µI) ≤ c1−α (6.18)

where c1−α is the estimated upper (1 − α) quantile of the distribution of∑d

i=1 λiZ2i , where λi are the eigen values estimated from the sample X1, . . . , Xn

using Theorem 6.2 and (Z1, Z2, . . .) is a sample of iid N(0, 1) independent ofthe sample X1, . . . , Xn.The corresponding bootstrapped confidence region is given by

µI : nd2g(µnI , µI) ≤ c∗1−α (6.19)

where c∗(1−α) is the upper (1 − α)-quantile of the bootstrapped values of thestatistic nTn.

30

7 Applications

Now we will apply the above results to some Riemannian manifolds includingthe Planer Shape Space.

7.1 Unit sphere Sk−1

At each p ∈ Sk−1, endow the tangent space Tp = v ∈ Rk : v.p = 0 with themetric tensor gp : Tp × Tp → R as the restriction of the scalar product at pof the tangent space of R

k : gp(v1, v2) = 〈v1, v2〉. Then g is a smooth metrictensor on the tangent bundle TSk−1 = (p, v) : p ∈ Sk−1, v ∈ Rk : v.p = 0.The geodesics are the big circles,

γp,v(t) = (cos t)p + (sin t)v

‖v‖ ,−π < t ≤ π (7.1)

The exponential map, Exp : Tp → Sk−1 is

Expp(0) = p,

Expp(v) = cos(‖v‖)p + sin(‖v‖) v

‖v‖ (v ∈ Tp) (7.2)

The cutlocus of p is C(p) = −p. The inverse of the Exponential map onSk−1 \ −p into Tp is

Exp−1p (q) =

arccos〈p, q〉√

1 − 〈p, q〉2[q − 〈p, q〉p] (q 6= p,−p),

Exp−1p (p) = 0 (7.3)

The geodesic distance dg on Sk−1 is

dg(p, q) = |arccos〈p, q〉| ∈ [0, π]

This space has constant sectional curvature 1. So if Q is a probability mea-sure on Sk−1, Q has an intrinsic mean if its support is contained in a geodesicball of radius at most π/4. (See Proposition 5.1). In this case the sampleIntrinsic mean (i.e., any measurable selection from the sample intrinsic meanset) based on a random sample from Q is consistent.

31

In case Q has a mean µI , pick an orthonormal basis for TµISd: v1, . . . , vd.

For x ∈ Sk−1, |〈x, µI〉| < 1, in Theorem 6.2, we have

φ(x) = exp−1µI

(x) =arccos〈x, µI〉√

1 − 〈x, µI〉2[x − 〈x, µI〉µI ] (7.4)

Let y = (y1, . . . , yk−1) = yrvr denote the normal coordinates of x. Then

yr =arccos〈x, µI〉√

1 − 〈x, µI〉2〈x, vr〉 r = 1, 2, . . . , k − 1. (7.5)

|y| = arccos〈x, µI〉 = dg(x, µI)

Q satisfies assumption (A), if all one dimensional curves have measure 0.This is true in particular if Q is absolutely continuous with respect to thevolume measure. Then

f(p) =pcosp

sinp(7.6)

Λrs = 2E[1

1 − 〈X1, µI〉2

(

1 − arccos〈X1, µI〉√

1 − 〈X1, µI〉2〈X1, µI〉

)

〈X1, vr〉〈X1, vs〉

+arccos〈X1, µI〉√

1 − 〈X1, µI〉2〈X1, µI〉δrs] 1 ≤ r ≤ s ≤ k − 1. (7.7)

7.2 Planer Shape Space Σk2

Consider first the complex projective space CP d: the space of all complexlines through the origin in Cd+1. Consider the map

π : CSd → CP d

z 7→ π(z) = [z];

z ∈ Cd+1, ‖z‖ =

d+1∑

j=1

|zj|2 = 1

This π is a Riemannian submersion. The tangent space TzCSd at z is the setof all vectors, v in Cd+1 orthogonal to z. Here for v,w ∈ Cd+1, < v,w > =Re(v

′

w).

32

Then for any [z] ∈ CP d, the tangent space T[z]CP d at [z] is isomorphic witha subspace called the horizontal subspace of TzCSd,

Hz = v ∈ Cd+1 : z

′

v = 0

Note thatTzCSd = v ∈ C

d+1 : Re(z′

v) = 0It can be shown that exp[z] = π expz dπ−1

z , and

dπ−1z (exp−1

[z] ([w]) =r

sinr−zcosr + eiθw ∈ Hz (7.8)

where z, w ∈ CSd,

r = dg([z], [w]) = arccos(|z′

w|) ∈ [0,π

2) (7.9)

and eiθ =z

′

w

|z′w|

Σk2 may be identified with the set of all complex lines in Hk−1 ≡ w ∈

Ck \ 0 :

∑k1 wj = 0, with a k-ad z = (z1, z2, . . . zk) ≡ (x1 + iy1, . . . xk +

iyk) represented by z− < z >. So one may express the geodesics, geodesicdistances, the exponential map and its inverse in Σk

2 by simply taking d =k−1 above and replacing the k-ads z,w by their preshapes (see Section 2.3).In view of the restriction to the complex hyperplane (of complex dimensionk-1), the tangent space T[z]Σ

k2 at the shape [z] is isomorphic with a subspace

of Hz namelyv ∈ C

k : z′

v = 0, v′

1k = 0Σk

2 is then isomorphic (and isometric) to CP k−2.This space has all sectional curvatures bounded between 1 and 4. The cut-locus of [p] is

C([p]) = [x] : dg([x], [p]) = arccos(|p′

x|) =π

2

= [x] : p′

x = 0

From Proposition 5.1, Q has an intrinsic mean, if its support is contained ina geodesic ball of radius at most π

8.

In case Q has intrinsic mean [µ], consider the coordinate patch φ = dπ−1µ (exp−1

[µ] [z])

33

around [µ]. From (7.8), we get

φ : Σk2 \ C([µ]) → Hµ

φ([z]) =r(z)

sinr(z)[−µcosr(z) + eiθ(z)z] (7.10)

where z, µ ∈ CSk−1 ∩ Hk−1,

r(z) = dg([z], [µ]) = arccos(|µ′

z|) ∈ [0,π

2)

and eiθ(z) =µ

′

z

|µ′ z| .

Q satisfies assumption (A) if it is absolutely continuous wrt the volumemeasure on the shape space. If [X1], [X2], . . . , [Xj ] is an iid sample fromQ, Xj’s being the pre-shapes, let Yj = φ(Xj). Then in Theorem 6.2,

Dh(Y1, µ) = −2Y1

=−2r1

sinr1

[−cosr1µ + eiθ1X1] (7.11)

where r1 = arccos(|µ′

X1|)

and eiθ1 =µ

′

X1

cosr1.

E(Dh(Y1, µ)) = 0

So to find µ, we need to find the zeros of the function: µ 7→ E(Dh(Y1, µ)).This is equivalent to finding the fixed points of the map

f : π−1Σk2 → CSk−1

µ 7→ expµdπ−1π(µ)(−gradF (π(µ)))

Here expµ is the exponential map on CSk−1,

expµ : TµCSk−1 → CSk−1

expµ(v) = (cos|v|)µ +sin|v||v| v.

So f(µ) = (cos|v|)µ +sin|v||v| v (7.9)

where v = 2Er1

sinr1(−cosr1µ + eiθ1X1)

34

A result (Theorem 4, LE [3]) says that if the support of Q is contained in ageodesic ball of radius at most 3π

40, then f has a unique fixed point µ, and

then [µ] is the intrinsic mean of Q.

The above result can be used to find the intrinsic sample mean by re-placing Q by the empirical distribution, Qn. That is, start with someθ0 ∈ CSk−1 ∩ Hk−1 and compute θm iteratively:

θm+1 = f(θm); m = 0, 1, 2, . . .

= (cos|vm|)θm +sin|vm||vm|

vm (7.10)

where vm = 2n∑

j=1

1

n

rj

sinrj

(−cosrjθm + eiθjXj)

rj = arccos(|θ′

mXj|)

eiθj =θ′

mXj

cosrj

If all the sample points are in a geodesic ball of radius at most 3π40

, thenthe above algorithm converges to µn, [µn] being the sample intrinsic mean,whetever θ0 we start with in that ball. We may take [θ0] to be the sampleextrinsic mean.

For the skull data, the female sample is contained in a geodesic ball ofradius 0.0703 while the male sample is contained in a geodesic ball of ra-dius 0.0855 around their respective sample extrinsic means. Since both radiiare << 3π/40, their sample intrinsic means exist and the above algorithmconverges to that. The geodesic distance between the extrinsic and intrinsicmeans come to the order of 10−6:

dg(µnE , µnI) = 5.5395e − 07

dg(µmE, µmI) = 1.9609e − 06

Here µnE and µmE denote the preshapes of the extrinsic sample means of thefemales and males respectively while µnI and µmI denote the correspondingsample intrinsic means.

35

Using the sample estimates, we can construct 95% asymptotic confidenceregions of the population intrinsic means as in (6.19). They are

Females: [µ1] : nd2g(µnI , µ1) ≤ 0.0003247 (7.10)

Males: µ2 : md2g(µmI , µ2) ≤ 0.0004691 (7.11)

Here n = 30 and m = 29 are the female and male sample sizes; µ1 and µ2

are the preshapes of their population intrinsic means.

The confidence regions in (7.10) and (7.11) can be used to to test if themale and female populations have same intrinsic mean shapes. We acceptH0: [µ1] = [µ2] if the regions overlap that is if

dg(µnI , µmI) <

√

0.0003247

n+

√

0.0004691

m

For this sample, dg(µnI , µmI) = 0.0587 while√

0.0003247n

+√

0.0004691m

= 0.0073.

So we reject H0.

8 Conclusion

There are many outstanding statistical problems in shape analysis whichremain unresolved. One of them is a proper analysis of 3-D shape spaces andof distributions on them. Another is the study of time series for evolution ofthe distribution of shapes, in discrete and in continuous time. For 2-D shapespaces and other Riemannian manifolds, a mathematical problem of somesignificance is to find broad conditions for the uniqueness of the intrinsicmean.

9 Acknowledgements

I am very thankful to my advisor, Dr. Rabi Bhattacharya for guiding methrough out my research. I would also like to thank other professors for theirthoughtful and constructive suggestions.

36

10 References

[1] BHATTACHARYA, R. and PATRANGENARU, V. (2003). Large Sam-ple Methods of Intrinsic and Extrinsic Sample Means on Manifolds-I.Ann. Statist.31 1-29.

[2] BHATTACHARYA, R. and PATRANGENARU, V. (2005). Large Sam-ple Methods of Intrinsic and Extrinsic Sample Means on Manifolds-II.Ann. Statist.33 1225-1259.

[3] KARCHAR, H. (1977). Riemannian Center of Mass & Mollifier Smooth-ing. Comm. on Pure & Applied Math.XXX 509-541.

[4] LE, HUILING (2001). Locating Frechet Means with Application toShape Spaces. Adv. Appl. Prob.33 324-338.

[5] PENNEC, XAVIER (1999). Probabilities and Statistics on RiemannianManifolds: Basic Tools for Geometric Measurements. NSIP’99.

[6] DRYDEN, I.L. and MARDIA, K.V. (1998). Statistical Shape Analysis.Wiley N.Y.

[7] KENDALL, D.G.; BARDEN, D.; CARNE, T.K. and LE, H. (1999).Shape & Shape Theory. Wiley N.Y.

[8] LEE, J.M. (1997). Riemannian Manifolds an introduction to Curva-ture. Springer.

37

−0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4Figure 1

Figure 1: Procrustes Coordinates of Female landmarks.* denotes coordinates of the mean shape.

−0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4Figure 2

Figure 2: Procrustes Coordinates of Male landmarks.* denotes coordinates of the mean shape.

38

−0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4Figure 3

FemaleMalePooled

Figure 3: Procrustes Coordinates of sample means.

39

Documents

STATISTICS ON RIEMANNIAN MANIFOLDS WITH ...ab216/ritten.pdf1.1 Frechet Mean Deﬁnition 1.1 Suppose F(p) < ∞ for some p ∈ M. Then the set of all p for which F(p) is the minimum