Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Analysis of Non-Euclidean Data: Use ofDifferential Geometry in Statistics
Rabi Bhattacharya, The University of Arizona, Tucson, AZ
[Research supported by NSF grant DMS1406872]
June, 2016
Based on joint work with A. Bhattacharya (BB), Lizhen Linand V. Patrangenaru (BP)
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
CONTENTS
1. Introduction
2. Frechet Means of Probabilities on Metric Spaces: Uniqueness;Consistency & CLT for Sample Frechet Means.
3. Applications and Examples.
(a). Sd (Sphere) Paleomagnetism.(b). Kendall’s Planar Shape Space Σk
2-Two-Sample Tests for (1)Schizophrenia and (2) Male & Female Gorilla Skulls.
(c). 3D Shape Space RΣk3-Match Pair Test for Glaucoma.
(d). The space Sym+(p) of positive definite matrices.(e). Stratified Spaces (1) Σk
m (m > 2), (2) Open Book.
4. Nonparametric Bayes Theory on Manifolds & Applications.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
GEOMETRY. A Manifold M of dimension d – a metric spacewith each point having a neighborhood diffeomorphic to an openset in Rd ; these maps on intersecting neighborhoods are smoothlyconnected.
EXAMPLE 1. Sphere Sd = {x ∈ Rd+1 : ‖x‖ = 1} (d ≥ 1).(Covered by two stereographic maps)
Extrinsic, or chord, distance d(p, q) = ‖p − q‖ (Euclideandistance inherited from an embedding J : M → RN). On Sd ,J is the inclusion map: Sd → Rd+1.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
Geometry
A tangent vector v at p is the derivative dc(t)/dt at t = 0 ofa smooth curve c(t), 0 ≤ t ≤ a, with c(0) = p, on M(computed in local coordinates in a nbd. of p). The set oftangent vectors at p is a d-dimensional vector space Tp(M).Tp(Sd) = {v ∈ Rd+1 : v orthogonal to p}.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
Geometry
Geodesic, or intrinsic, distance ρg (p, q): Arc lengthminimizing distance along smooth curves [depends on ametric tensor g providing inner products smoothly on thetangent spaces of M]. Arc length of a curve c(t), 0 ≤ t ≤ a,from p to q is
∫[0,a] |dc(t)/dt| dt . On Sd , ρg (p, q) = arc
length along the big circle joining p and q.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
Geometry
Geodesics on M – (locally) minimize geodesic distancesbetween points. A geodesic from p is entirely determined by(the initial point p and) a tangent vector v at p. On Sd
geodesics are the big circles.
Cut point of p is the point along a geodesic from p beyondwhich the geodesic arc length is not distance minimizing. Cutlocus of p is the collection of all cut points of p. On Sd thecut point of p (along each geodesic) is −p (antipodal point),so the cut locus of p is {−p}.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
Geometry
Geodesics on M – (locally) minimize geodesic distancesbetween points. A geodesic from p is entirely determined by(the initial point p and) a tangent vector v at p. On Sd
geodesics are the big circles.
Cut point of p is the point along a geodesic from p beyondwhich the geodesic arc length is not distance minimizing. Cutlocus of p is the collection of all cut points of p. On Sd thecut point of p (along each geodesic) is −p (antipodal point),so the cut locus of p is {−p}.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
Example 2.
EXAMPLE 2. M = Sd/G – the space of orbits of Sd undera (Lie) group of isometries G of Sd .
For p ∈ Sd , [p] = {hp : h ∈ G} is the orbit of p, andM = {[p] : p ∈ Sd}.
Intrinsic distance ρg ([p], [q]) = inf{ρg (hp, h′q) : h, h′ ∈ G}.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
Example 2(a). Axial Space RPd
Axial SpaceRPd = {Set of all lines passing through the origin in Rd+1},also identified as {the set of pairs of points (p,−p) : p ∈ Sd},and as Sd/G, where G = {h, Id}, with hp = −p.
Cut locus of a point can be identified with RPd−1.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
Example 2 (b): Kendall’s planar shape space Σk2 (k > 2).
A k-ad is a set of k points {(x1, y1), . . . , (xk , yk)} in R2, notall the same.
Σk2 is the set of all k-ads modulo translation scaling and
rotation in the plane.
That is, first subtract from each (xi , yi ) the mean of the kpoints; then divide the centered vector by its Euclidean normto get the pre-shape sphere identified as S2k−3. Then letΣk
2 = S2k−3/G, where G = SO(2), the space of rotations inthe plane (a Lie group of isometries of dimension 1).
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
Example 2 (b): Kendall’s planar shape space Σk2 (k > 2).
A k-ad is a set of k points {(x1, y1), . . . , (xk , yk)} in R2, notall the same.
Σk2 is the set of all k-ads modulo translation scaling and
rotation in the plane.
That is, first subtract from each (xi , yi ) the mean of the kpoints; then divide the centered vector by its Euclidean normto get the pre-shape sphere identified as S2k−3. Then letΣk
2 = S2k−3/G, where G = SO(2), the space of rotations inthe plane (a Lie group of isometries of dimension 1).
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
Example 2 (b): Kendall’s shape space Σk2 .
Hence Σk2 has dimension 2k − 4 and is called Kendall’s space
of planar shapes.
When the k-ads are represented as points on the complexplane, and are centered, then it lies on a space isomorphic toCk−1, and scaling and a rotation of a point p in Ck−1 can berepresented as {λp : λ ∈ C}, i.e., a complex line passingthrough the origin and p. The space of all such points is thecomplex projective space CPk−2. Cut Locus of a point can beidentified with CPk−3, that is, with Σk−1
2 .
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
FRECHET MEANS ON METRIC SPACES
Frechet function of a probability distribution Q is
F (p) =
∫ρ2(p, x)Q(dx), p ∈ M.
Frechet mean set is the set of minimizers of F . A uniqueminimizer is called the Frechet mean of Q, say µ . SampleFrechet mean µn is a measurable selection from the mean setof the empirical Qn based on i.i.d. X1, · · · ,Xn ∼ Q.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
FRECHET MEANS ON METRIC SPACES
Frechet function of a probability distribution Q is
F (p) =
∫ρ2(p, x)Q(dx), p ∈ M.
Frechet mean set is the set of minimizers of F . A uniqueminimizer is called the Frechet mean of Q, say µ . SampleFrechet mean µn is a measurable selection from the mean setof the empirical Qn based on i.i.d. X1, · · · ,Xn ∼ Q.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
FRECHET MEANS ON METRIC SPACES
Proposition 1.(Ziezold,1977; BP,2003) Let F be finite. (i)then the Frechet mean set is nonempty compact. (ii) in caseof a unique minimum. µn → µ (with probability one).
Remark 1.The extrinsic mean based on ρ inherited from Euclidean spaceEN via an embedding J
J : M → EN
is given by µ = J−1(PJ(M)µJ(Q)), if the projection PJ(M) ofthe Euclidean mean µJ of Q ◦ J−1 on J(M) is unique.
If M is Riemannian and ρ is the geodesic distance, then theFrechet minimizer is called the intrinsic mean (if unique) (anopen problem).
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
FRECHET MEANS ON METRIC SPACES
Proposition 1.(Ziezold,1977; BP,2003) Let F be finite. (i)then the Frechet mean set is nonempty compact. (ii) in caseof a unique minimum. µn → µ (with probability one).
Remark 1.The extrinsic mean based on ρ inherited from Euclidean spaceEN via an embedding J
J : M → EN
is given by µ = J−1(PJ(M)µJ(Q)), if the projection PJ(M) ofthe Euclidean mean µJ of Q ◦ J−1 on J(M) is unique.If M is Riemannian and ρ is the geodesic distance, then theFrechet minimizer is called the intrinsic mean (if unique) (anopen problem).
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
FRECHET MEANS ON METRIC SPACES
Remark 2. The embedding J considered here is equivariantunder the action of a large Lie group G :∃ a group homomorphism Φ : G → GL(N,EN), g → Φ(g)such that
Φ(g)(J(x)) = J(gx), ∀g ∈ G , x ∈ M.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
AN OMNIBUS CENTRAL LIMIT THEOREM(Bhattacharya and Lin (2016))
We make the following assumptions.
(A1) The Frechet mean µ of Q is unique.
(A2) µ ∈ G , G ⊂ M, ∃ a homeomorphism φ : G → U, open ⊂ Rs
(s ≥ 1) and
x 7→ h(x ; q) := ρ2(φ−1(x), q) (1)
is C 2 on U, for every q outside a Q-null set.
(A3) P(µn belongs to G )→ 1 as n→∞.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
AN OMNIBUS CENTRAL LIMIT THEOREM
(A4) Let Drh(x ; q) = ∂h(x ; q)/∂xr , r = 1, . . . , s. Then
E |Drh(φ(µ);Y1)|2 <∞, E |Dr ,r ′h(φ(µ);Y1)| <∞ r , r ′ = 1, . . . , s.(2)
(A5) Let ur ,r ′(ε; q) = sup{|Dr ,r ′h(θ; q)− Dr ,r ′h(φ(µ); q)| :|θ − φ(µ)| < ε}. Then
E |ur ,r ′(ε;Y1)| → 0 as ε→ 0 for all 1 ≤ r , r ′ ≤ s. (3)
(A6) The matrix Λ = [EDr ,r ′h(φ(µ);Y1)]r ,r ′=1,...,s is nonsingular.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
An OMNIBUS CENTRAL LIMIT THEOREM FORFRECHET MEAN
Theorem 2.1 (Bhattacharya and Lin (2016))
Under assumptions (A1)-(A6) ,
n1/2[φ(µn)− φ(µ)]L−→ N(0,Λ−1CΛ−1), as n→∞, (4)
where C is the covariance matrix of {Drh(φ(µ);Y1), r = 1, . . . , s}.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
A CENTRAL LIMIT THEOREM FOR FRECHET MEAN
Remark 3. For the intrinsic mean, Theorem 2.1 holds only ifQ assigns probability zero to a neighborhood of the cut locusof µ.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
A CLT for Intrinsic Means
Theorem 2.2 (Bhattacharya and Lin (2016))
Let C (B) denote the set of cut loci of points p ∈ B. Also, let φ bethe log map, or Exp−1. Suppose that Q has an intrinsic mean µ,and that Q is absolutely continuous in a neighborhood W of thecut locus of µ with a continuous density there with respect to thevolume measure. Assume also that
(i) Q(C (B(µ; ε))) = O(εd−c), ε→ 0, for some c, 0 ≤ c < d ;
(ii) on some neighborhood V of ν = φ(µ) = 0 the functionθ → F
(φ−1(θ)
)is twice continuously differentiable with a
nonsingular Hessian Λ(θ), and
(iii) (A4) holds with φ(µ) replaced by θ, ∀θ ∈ V .
Then, if d > c + 2, one has the CLT (4) for sample mean µn.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
A CLT for Intrinsic Means on Sd
Corollary 2.3
Let M = Sd , d > 2. If Q has a C 2 density and has a uniqueintrinsic mean then the CLT holds for the sample intrinsic mean.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
EXAMPLES & APPLICATIONS
2(a). Example 1 (Sd).
Let X1, . . . ,Xn be i.i.d on Sd . The von Mises-Fisherdistribution on the sphere Sd has the following density (w.r.t.the uniform distribution on Sd).
f (x ;µ, τ) = Cd(τ)exp{τ < x , µ >}, x ∈ Sd (µ ∈ Sd , τ ≥ 0).(5)
Intrinsic & extrinsic means are both µ. The MLE of µ is thesample extrinsic mean µn,E .
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
APPLICATIONS of Sd
Application 1 (Paleomagnetism). In a seminal paper, Fisher(1953) estimated mean directions of the magnetic pole for two setsof data-from a recent period and from a geologically differentperiod in the past. Using the model (5), he constructed confidenceregions for the two mean directions, and provided convincingevidence the the polarities had nearly reversed. We compareFisher’s confidence regions for the extrinsic mean for two sets ofdata.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
0.1
0.2
0.3
0.4
0.5
−0.1
0
0.1
0.2
0.3
0.88
0.9
0.92
0.94
0.96
0.98
1
Figure: Confidence regions for the direction of earth’ s magnetic poles,using Fisher’ s method (red) and the nonparametric extrinsic method(blue), in Fisher’ s first example.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
−0.05
0
0.05
0.1
0.15
−0.05
0
0.05
0.1
0.15
0.985
0.99
0.995
1
Figure: Confidence regions for the direction of earth’ s magnetic poles,using Fisher’ s method (red) and the nonparametric extrinsic method(blue), based on the Jurassic period data of Irving (1963).
In both cases, Fishers confidence regions are about 10% larger (inarea) than those given by the nonparametric method.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
KENDALL’S SHAPE SPACES
2(b). KENDALL’S SHAPE SPACES Σkm.
Each observation x = (x1, . . . , xk) of k > m points inm-dimension (not all the same)-k locations on anm-dimensional object. k-ads are equivalent mod G : a groupG of transformations.
G is generated by translations, scaling (to unit size), rotations.
Preshape
u = (x1− < x >, . . . , xk− < x >)/||x− < x > ||
u ∈ Sm(k−1)−1, the preshape sphere.
Shape of k-ad σ(x) ∈ Sm(k−1)−1/SO(m) = Σkm.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
KENDALL’S SHAPE SPACES
2(b). KENDALL’S SHAPE SPACES Σkm.
Each observation x = (x1, . . . , xk) of k > m points inm-dimension (not all the same)-k locations on anm-dimensional object. k-ads are equivalent mod G : a groupG of transformations.
G is generated by translations, scaling (to unit size), rotations.
Preshape
u = (x1− < x >, . . . , xk− < x >)/||x− < x > ||
u ∈ Sm(k−1)−1, the preshape sphere.
Shape of k-ad σ(x) ∈ Sm(k−1)−1/SO(m) = Σkm.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
KENDALL’S SHAPE SPACES
2(b). KENDALL’S SHAPE SPACES Σkm.
Each observation x = (x1, . . . , xk) of k > m points inm-dimension (not all the same)-k locations on anm-dimensional object. k-ads are equivalent mod G : a groupG of transformations.
G is generated by translations, scaling (to unit size), rotations.
Preshape
u = (x1− < x >, . . . , xk− < x >)/||x− < x > ||
u ∈ Sm(k−1)−1, the preshape sphere.
Shape of k-ad σ(x) ∈ Sm(k−1)−1/SO(m) = Σkm.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
KENDALL’S SHAPE SPACES
Case m = 2. Planar shapes. M = Σk2 .
σ(x) = σ(u) ≡ [u] = {e iθu : −π < θ ≤ π}.
M ' S2k−3/SO(2) ' CPk−2 (Complex projective space)
Extrinsic mean µE : Embedding:
J : σ(x) 7→ uu* ∈ S(k ,C)(k × k Hermitian matrices)
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
KENDALL’S SHAPE SPACES
Case m = 2. Planar shapes. M = Σk2 .
σ(x) = σ(u) ≡ [u] = {e iθu : −π < θ ≤ π}.
M ' S2k−3/SO(2) ' CPk−2 (Complex projective space)
Extrinsic mean µE : Embedding:
J : σ(x) 7→ uu* ∈ S(k ,C)(k × k Hermitian matrices)
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
KENDALL’S SHAPE SPACES
Proposition 2 .(BP(2003)) µE exists iff the largesteigenvalue λ of E (uu*) is simple. [J(µE ) = µ0µ
∗0, µ0 unit
eigenvector for λ].
Case m > 2. Σkm has singularities. Action of SO(m) is not
free on M.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
KENDALL’S SHAPE SPACES
Remark 4 (Riemannian Submersions). Other than regularsubmanifolds of ED , such as Sd , which inherit the Euclideanmetric tensor, most of the manifolds of interest here are of theform of M = N/G , where N is a Riemannian manifold and Gis a Lie group of isometries on N. M is the space of orbitsOx = {gx , g ∈ G} (x ∈ N). The tangent space Tp(M) atp = Ox ∈ M is the horizontal subspace of Tx(N), orthogonalto the direction along the orbit. Tp(M) inherits the metrictensor from Tx(N) on this subspace.
Example: N = S2(k−1)−1 = S2k−3. G = SO(2), M = Σk2 .
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
EXAMPLES & APPLICATIONS FOR Σk2
Two-sample problem: discrimination between two shapedistributions.APPLICATION 2.(Bookstein (1991), Dryden and Mardia (1998),BP (2005), BB(2008), (2012)). Brain scan shapes of schizophrenicand normal patients.k = 13 landmarks were recorded on the midsagittal slice of thebrain scan of each of n1 = 14 schizophrenic patients and n2 = 14normal patients (Bookstein (1991)). Shape space Σ13
2 .
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.414 normal children 13 landmarks, along with the mean shape
(a)
−0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.414 schizophrenic children 13 landmarks, along with the mean shape
(b)
Figure: (a) and (b) show 13 landmarks for 14 normal and 14schizophrenic children respectively along with the respective meanshapes. * correspond to the mean shapes’ landmarks.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
−0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
1
2
3
4
5
6
7
8
Figure: The sample extrinsic means for the 2 groups along with thepooled sample mean, corresponding to Figure 3.
p-value: nonparametric tests (intrinsic & extrinsic) 4× 10−11
Goodalls parametric test 0.01 Hotellings T 2 test 0.66Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
EXAMPLES & APPLICATIONS FOR Σk2
APPLICATION 3. Shapes of male and female gorilla skulls.k = 8 landmarks, n1 = 29 male skulls, n2 = 30 female skulls (BP(2005), BB(2008), (2012), Dryden & Mardia (1998)). Shapespace Σ8
2.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
(a) (b)
Figure: (a) and (b) show 8 landmarks from skulls of 30 female and 29male gorillas respectively along with the respective sample mean shapes.* correspond to the mean shapes’ landmarks.
p-value: Nonparametric tests (intrinsic & extrinsic) < 10−16
Parametric test (Hotellings t2 test, boxs m-test) 0.0001Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
2 (c). 3D Shape Space RΣk3-Match Pair Test for Glaucoma
Assume affine span of each k-ad is Rm, with preshapeu = (u1, . . . , uk) ∈ Sm(k−1)−1.
Shape σ(x) ∈ Sm(k−1)−1/O(m) = M. Embedding
J : σ(x) 7→ ((ui · uj)) (M → S0+(k,R))
(Bandulasiri and Patrangenaru (2005), Bandulasiri and BP(2009), Dryden, Kume, Le, Wood (2008))
Proposition 3.(A. Bhattacharya (2008)). Let λ1 ≥ . . . ≥ λkbe the eigenvalues of E ((ui · uj)), with correspondingorthonormal eigenvectors U1, . . . ,Uk . Then (i) µE exists iffλm > λm+1 and then (ii) J(µE ) = (v1, . . . , vm)(v1, . . . , vm)t ,where vj = (λj − λ+ 1/m)1/2Uj , with λ = (λ1 + . . .+λm)/m.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
2 (c). 3D Shape Space RΣk3-Match Pair Test for Glaucoma
Assume affine span of each k-ad is Rm, with preshapeu = (u1, . . . , uk) ∈ Sm(k−1)−1.
Shape σ(x) ∈ Sm(k−1)−1/O(m) = M. Embedding
J : σ(x) 7→ ((ui · uj)) (M → S0+(k,R))
(Bandulasiri and Patrangenaru (2005), Bandulasiri and BP(2009), Dryden, Kume, Le, Wood (2008))
Proposition 3.(A. Bhattacharya (2008)). Let λ1 ≥ . . . ≥ λkbe the eigenvalues of E ((ui · uj)), with correspondingorthonormal eigenvectors U1, . . . ,Uk . Then (i) µE exists iffλm > λm+1 and then (ii) J(µE ) = (v1, . . . , vm)(v1, . . . , vm)t ,where vj = (λj − λ+ 1/m)1/2Uj , with λ = (λ1 + . . .+λm)/m.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
2 (c). 3D Shape Space RΣk3-Match Pair Test for Glaucoma
Assume affine span of each k-ad is Rm, with preshapeu = (u1, . . . , uk) ∈ Sm(k−1)−1.
Shape σ(x) ∈ Sm(k−1)−1/O(m) = M. Embedding
J : σ(x) 7→ ((ui · uj)) (M → S0+(k,R))
(Bandulasiri and Patrangenaru (2005), Bandulasiri and BP(2009), Dryden, Kume, Le, Wood (2008))
Proposition 3.(A. Bhattacharya (2008)). Let λ1 ≥ . . . ≥ λkbe the eigenvalues of E ((ui · uj)), with correspondingorthonormal eigenvectors U1, . . . ,Uk . Then (i) µE exists iffλm > λm+1 and then (ii) J(µE ) = (v1, . . . , vm)(v1, . . . , vm)t ,where vj = (λj − λ+ 1/m)1/2Uj , with λ = (λ1 + . . .+λm)/m.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
APPLICATION of (RΣk3)
To detect any shape change of the inner eye due to glaucoma, 3Dimages of the optical nerve head (ONH) of both eyes of 12 maturerhesus monkeys were recorded. One of the eyes was subjected toincreased intraocular pressure (IOP). k = 5 landmarks of the innereye were measured on each eye. For this match pair experiment,the manifold is RΣk
3 × RΣk3 . The null hypothesis is that the
(extrinsic) mean lies on the diagonal of this product manifold(BP(2005), BB(2009)). p-value of the nonparametric chisquaretest is (BB(2009)) 1.55× 10−5.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
2 (d).Sym+(p)–p × p Positive Definite Matrices
1. Euclidean metric: ‖A‖2 = Trace(A)2. Sym+(p) openconvex subset of Sym(p), and Q on Sym+(p) has Euclideanmean µE =
∫AQ(dA).
2. log-Euclidean metric (Arsigney et al. (2006)).J ≡ log : Sym+(p)→ Sym(p) is the inverse of the exponentialmap B → eB , Sym(p)→ Sym+(p). (Diffeomorphism).dLE (A1,A2) = ‖ log(A1)− log(A2)‖.µLE = exp(
∫(log(A))Q(dA)). (Extrinsic mean under J).
Also, intrinsic mean under bi-invariant metric of Sym+(p) as aLie group: A1 ◦ A2 = exp(log(A1) + log(A2)) (zero-curvature).
3. Affine invariant metric.d2AI (A1,A2) = ‖ log(A
−1/21 A2A
−1/21 )‖2.
〈B1,B2〉 = Trace(A−1B1A−1B2 (Non-positive curvature).
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
2 (d).Sym+(p)–p × p Positive Definite Matrices
1. Euclidean metric: ‖A‖2 = Trace(A)2. Sym+(p) openconvex subset of Sym(p), and Q on Sym+(p) has Euclideanmean µE =
∫AQ(dA).
2. log-Euclidean metric (Arsigney et al. (2006)).J ≡ log : Sym+(p)→ Sym(p) is the inverse of the exponentialmap B → eB , Sym(p)→ Sym+(p). (Diffeomorphism).dLE (A1,A2) = ‖ log(A1)− log(A2)‖.µLE = exp(
∫(log(A))Q(dA)). (Extrinsic mean under J).
Also, intrinsic mean under bi-invariant metric of Sym+(p) as aLie group: A1 ◦ A2 = exp(log(A1) + log(A2)) (zero-curvature).
3. Affine invariant metric.d2AI (A1,A2) = ‖ log(A
−1/21 A2A
−1/21 )‖2.
〈B1,B2〉 = Trace(A−1B1A−1B2 (Non-positive curvature).
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
2 (d).Sym+(p)–p × p Positive Definite Matrices
1. Euclidean metric: ‖A‖2 = Trace(A)2. Sym+(p) openconvex subset of Sym(p), and Q on Sym+(p) has Euclideanmean µE =
∫AQ(dA).
2. log-Euclidean metric (Arsigney et al. (2006)).J ≡ log : Sym+(p)→ Sym(p) is the inverse of the exponentialmap B → eB , Sym(p)→ Sym+(p). (Diffeomorphism).dLE (A1,A2) = ‖ log(A1)− log(A2)‖.µLE = exp(
∫(log(A))Q(dA)). (Extrinsic mean under J).
Also, intrinsic mean under bi-invariant metric of Sym+(p) as aLie group: A1 ◦ A2 = exp(log(A1) + log(A2)) (zero-curvature).
3. Affine invariant metric.d2AI (A1,A2) = ‖ log(A
−1/21 A2A
−1/21 )‖2.
〈B1,B2〉 = Trace(A−1B1A−1B2 (Non-positive curvature).
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
2 (d).Sym+(p)–p × p Positive Definite Matrices
APPLICATIONS (p=3). DTI (Diffusion Tensor Imaging)provides measurements of the diffusion matrix of watermolecules in tiny voxels in the white matter of the brain.Anistropy in the presence of the structural barriers of nervefibers is reduced when a trauma occurs (Parkinsons,Alzheimers,...). Challenges to statistical inference. Also seeSchwartzman (2014).
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
APPLICATIONS (p = 3)- HIV IMAGING DATA
Diffusion-weighted images were acquired for each of 46subjects with 28 HIV+ subjects and 18 healthy controls.
In the previous DTI findings, the diffusion tensors in thesplenium of the corpus callosum were found significantlydifferent between the HIV+ and control group. We examinethe finite sample performance of our method by using thisfiber tract.
Diffusion tensor were constructed for 75 voxels along the fiber.
In order to detect meaningful group differences, registration iscrucial. The 46 HIV DTI data used in our studies, includingthe splenium tracts and diffusion tensors on them, wereregistered in the same atlas space.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
APPLICATIONS (p = 3)- HIV IMAGING DATA
Diffusion-weighted images were acquired for each of 46subjects with 28 HIV+ subjects and 18 healthy controls.
In the previous DTI findings, the diffusion tensors in thesplenium of the corpus callosum were found significantlydifferent between the HIV+ and control group. We examinethe finite sample performance of our method by using thisfiber tract.
Diffusion tensor were constructed for 75 voxels along the fiber.
In order to detect meaningful group differences, registration iscrucial. The 46 HIV DTI data used in our studies, includingthe splenium tracts and diffusion tensors on them, wereregistered in the same atlas space.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
APPLICATIONS (p = 3)- HIV IMAGING DATA
We first carry out the two-sample testing (voxel-wise) using atesting statistics based on the usual Euclidean distance.
(X − Y )Σ−1(X − Y )T
where X and Y are the sample mean vector of dimension 6 ofX and Y respectively, Σ = (1/n1ΣX + 1/n2ΣY ).
The testing statistics has a asymptotic chisquare distributionχ2(6).
Next is a plot of the p-values along the fiber tracks.
We can apply the Benjamin-Yekutieli procedure to controlfalse discovery rate.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
APPLICATIONS (p = 3)- HIV IMAGING DATA
0 10 20 30 40 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Arc length
p−
va
lue
s
p−values along the fibers using Euclidean distance
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
FALSE DISCOVERY RATE. BENJAMIN-HOCHBERGPROCEDURE
Set α = 0.05. Apply Benjamini-Hochberg procedure to thetests.
Reject only the k null hypothesis with the smallest p-values,where k = max{i : p(i) ≤ 1
mα}.
In our example we first order the 75 p-values corresponding tothe tests carried out at all the locations.
The ordered p-values are compared with the vector{0.05/75, 0.1/75, . . . , 0.05}, which gives the result k = 58.
Therefore we reject the 58 null hypotheses corresponding tothe first 58 ordered p-values.
The false discovery rate is smaller than m0/mα ≤ α.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
FALSE DISCOVERY RATE. BENJAMIN-HOCHBERGPROCEDURE
Set α = 0.05. Apply Benjamini-Hochberg procedure to thetests.
Reject only the k null hypothesis with the smallest p-values,where k = max{i : p(i) ≤ 1
mα}.In our example we first order the 75 p-values corresponding tothe tests carried out at all the locations.
The ordered p-values are compared with the vector{0.05/75, 0.1/75, . . . , 0.05}, which gives the result k = 58.
Therefore we reject the 58 null hypotheses corresponding tothe first 58 ordered p-values.
The false discovery rate is smaller than m0/mα ≤ α.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
FALSE DISCOVERY RATE. BENJAMIN-HOCHBERGPROCEDURE
Set α = 0.05. Apply Benjamini-Hochberg procedure to thetests.
Reject only the k null hypothesis with the smallest p-values,where k = max{i : p(i) ≤ 1
mα}.In our example we first order the 75 p-values corresponding tothe tests carried out at all the locations.
The ordered p-values are compared with the vector{0.05/75, 0.1/75, . . . , 0.05}, which gives the result k = 58.
Therefore we reject the 58 null hypotheses corresponding tothe first 58 ordered p-values.
The false discovery rate is smaller than m0/mα ≤ α.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
APPLICATIONS (p=3)- HIV IMAGING DATA
Second, we carry out two-sample testings based on thelog-Euclidean distance of the DTI matrices. The matrix log ofeach raw diffusion matrix is first calculated. The testingstatistics is based on the Euclidean distance of the 6 distinctvalues of the log matrices.
Next is a plot of the p-values along the fiber tracks.
To control false discovery rate, we also carry out theBenjamini-Hochberg procedure. We reject the first 48 testsbased on the order p-values.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
APPLICATIONS (p=3)- HIV IMAGING DATA
0 10 20 30 40 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Arc length
p−
va
lue
s
p−values along the fibers using log−Euclidean distance
p−value
0.05
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
Plot of the p-values
0 10 20 30 40 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Arc length
p−
va
lue
s
p−values along the fibers using Euclidean and log−Euclidean distance
log−Euclidean
0.05
Euclidean
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
2(e). Stratified Spaces (1) Σkm (m > 2)
Stratified spaces S are made up of several subspaces ofdifferent dimensions.
A familiar example is Σkm with m > 2. After translation and
scaling the k-ads lie in (and fill out) a preshape sphereSmk−m−1. The shape space is then viewed asΣkm = Smk−m−1/SO(m).
For simplicity, consider m = 3. One may split Σk3 into two
strata. The larger stratum S1 corresponds to shapes ofnon-collinear k-ads.
S1 is a manifold of dimension 3k − 4− 3 = 3k − 7. Themanifold is not complete in the geodesic distance.
The other stratum S0 comprises shapes of k-ads each k-adbeing a set of k collinear points in R3. Each orbit hasdimension 3. The stratum S0 may be given the structure of adifferentiable manifold of dimension k − 2.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
2(e). Stratified Spaces (1) Σkm (m > 2)
Stratified spaces S are made up of several subspaces ofdifferent dimensions.
A familiar example is Σkm with m > 2. After translation and
scaling the k-ads lie in (and fill out) a preshape sphereSmk−m−1. The shape space is then viewed asΣkm = Smk−m−1/SO(m).
For simplicity, consider m = 3. One may split Σk3 into two
strata. The larger stratum S1 corresponds to shapes ofnon-collinear k-ads.
S1 is a manifold of dimension 3k − 4− 3 = 3k − 7. Themanifold is not complete in the geodesic distance.
The other stratum S0 comprises shapes of k-ads each k-adbeing a set of k collinear points in R3. Each orbit hasdimension 3. The stratum S0 may be given the structure of adifferentiable manifold of dimension k − 2.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
2(e). Stratified Spaces (1) Σkm (m > 2)
Stratified spaces S are made up of several subspaces ofdifferent dimensions.
A familiar example is Σkm with m > 2. After translation and
scaling the k-ads lie in (and fill out) a preshape sphereSmk−m−1. The shape space is then viewed asΣkm = Smk−m−1/SO(m).
For simplicity, consider m = 3. One may split Σk3 into two
strata. The larger stratum S1 corresponds to shapes ofnon-collinear k-ads.
S1 is a manifold of dimension 3k − 4− 3 = 3k − 7. Themanifold is not complete in the geodesic distance.
The other stratum S0 comprises shapes of k-ads each k-adbeing a set of k collinear points in R3. Each orbit hasdimension 3. The stratum S0 may be given the structure of adifferentiable manifold of dimension k − 2.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
2(e). Stratified Spaces (1) Σkm (m > 2)
Stratified spaces S are made up of several subspaces ofdifferent dimensions.
A familiar example is Σkm with m > 2. After translation and
scaling the k-ads lie in (and fill out) a preshape sphereSmk−m−1. The shape space is then viewed asΣkm = Smk−m−1/SO(m).
For simplicity, consider m = 3. One may split Σk3 into two
strata. The larger stratum S1 corresponds to shapes ofnon-collinear k-ads.
S1 is a manifold of dimension 3k − 4− 3 = 3k − 7. Themanifold is not complete in the geodesic distance.
The other stratum S0 comprises shapes of k-ads each k-adbeing a set of k collinear points in R3. Each orbit hasdimension 3. The stratum S0 may be given the structure of adifferentiable manifold of dimension k − 2.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
2(e). Stratified Spaces (2) Open Book (Hotz et al. (2013))
An open book O is the disjoint union of K open leavesL+j (H, j), 1 ≤ j ≤ K , joined at the spine L0 as the common
boundary. Here H = (0,∞)× Rd and L0 = 0× Rd .
The distance ρ on L+j , or L0, is the usual Euclidean distance
on Rd+1, but for j 6= k , ρ((x , j), (y , k)) = |x − Ry |, where Ryis the reflection, Ry = (−y (0), y (1), . . . , y (d)) ∀y = (y (0), y (1), . . . , y (d)) ∈ [0,∞)× Rd .
The open book is a geodesic space with non-positivecurvature in the sense of A.D. Alexandrov and therefore Q hasa unique Frechet mean.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
2(e). Stratified Spaces (2) Open Book (Hotz et al. (2013))
An open book O is the disjoint union of K open leavesL+j (H, j), 1 ≤ j ≤ K , joined at the spine L0 as the common
boundary. Here H = (0,∞)× Rd and L0 = 0× Rd .
The distance ρ on L+j , or L0, is the usual Euclidean distance
on Rd+1, but for j 6= k , ρ((x , j), (y , k)) = |x − Ry |, where Ryis the reflection, Ry = (−y (0), y (1), . . . , y (d)) ∀y = (y (0), y (1), . . . , y (d)) ∈ [0,∞)× Rd .
The open book is a geodesic space with non-positivecurvature in the sense of A.D. Alexandrov and therefore Q hasa unique Frechet mean.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
2(e). Stratified Spaces (2) Open Book (Hotz et al. (2013))
Consider the map Fj : O → Rd+1, Fj((x , j)) = x ,Fj((x , k)) = Rx if k 6= j . Write mj =
∫x (0)(Q ◦ F−1
j )(dx).
Under the assumption Q(L+j ) > 0 for 1 ≤ j ≤ K , either (1)
mj ≥ 0 for some j , and mk < 0 ∀ k 6= j , or (2) mj < 0 ∀ j .
In case (2), the Frechet mean is sticky, that is, withprobability one, µN ∈ L0 for all sufficiently large N. Also, theclassical CLT holds on the d-dimensional space L0.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
2(e). Stratified Spaces (2) Open Book (Hotz et al. (2013))
Consider the map Fj : O → Rd+1, Fj((x , j)) = x ,Fj((x , k)) = Rx if k 6= j . Write mj =
∫x (0)(Q ◦ F−1
j )(dx).
Under the assumption Q(L+j ) > 0 for 1 ≤ j ≤ K , either (1)
mj ≥ 0 for some j , and mk < 0 ∀ k 6= j , or (2) mj < 0 ∀ j .
In case (2), the Frechet mean is sticky, that is, withprobability one, µN ∈ L0 for all sufficiently large N. Also, theclassical CLT holds on the d-dimensional space L0.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
2(e). Stratified Spaces (2) Open Book (Hotz et al. (2013))
Consider the map Fj : O → Rd+1, Fj((x , j)) = x ,Fj((x , k)) = Rx if k 6= j . Write mj =
∫x (0)(Q ◦ F−1
j )(dx).
Under the assumption Q(L+j ) > 0 for 1 ≤ j ≤ K , either (1)
mj ≥ 0 for some j , and mk < 0 ∀ k 6= j , or (2) mj < 0 ∀ j .
In case (2), the Frechet mean is sticky, that is, withprobability one, µN ∈ L0 for all sufficiently large N. Also, theclassical CLT holds on the d-dimensional space L0.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
2(e). Stratified Spaces (2) Open Book
Recall case (1) mj ≥ 0 for some j , and mk < 0 ∀ k 6= j .
If in case (1), mj > 0, then µ lies in the open leaf L+j , as do
µN for all sufficient large N; hence the classical(d + 1)-dimensional CLT holds.
If, however, mj = 0, µ ∈ L0; but µN ∈ Lj if (the empirical)mj ,n > 0 and µN ∈ L0 if mj ,n ≤ 0; hence the asymptoticdistribution centered at µ is the distribution of
((X(0)+ ,X (1), . . . ,X (d)), j) on L
(+)j ∪ L0, where
(X (0),X (1), . . . ,X (d)) has the Gaussian distribution stated
under the preceding case mj > 0, and X(0)+ = max{X (0), 0}.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
2(e). Stratified Spaces (2) Open Book
Recall case (1) mj ≥ 0 for some j , and mk < 0 ∀ k 6= j .
If in case (1), mj > 0, then µ lies in the open leaf L+j , as do
µN for all sufficient large N; hence the classical(d + 1)-dimensional CLT holds.
If, however, mj = 0, µ ∈ L0; but µN ∈ Lj if (the empirical)mj ,n > 0 and µN ∈ L0 if mj ,n ≤ 0; hence the asymptoticdistribution centered at µ is the distribution of
((X(0)+ ,X (1), . . . ,X (d)), j) on L
(+)j ∪ L0, where
(X (0),X (1), . . . ,X (d)) has the Gaussian distribution stated
under the preceding case mj > 0, and X(0)+ = max{X (0), 0}.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
2(e). Stratified Spaces (2) Open Book
Remark 5. The study of this and some toy models ofphylogenetic trees has been motivated in part by thepioneering work of Susan Holmes and her collaborators.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
3 (a). NONPARAMETRIC BAYES ONMANIFOLDS-DENSITY ESTIMATIONA. Bhattacharya and D. Dunson (2010, 2012), BB (2012)
Density of Q with a standard measure on M, represented as amixture P(dθ) of a parametric family of densities K (x ; θ)(θ ∈ Θ)
f (x ;P) =
∫ΘK (x ; θ)P(dθ).
P is a probability measure on Θ, is often imposed with aDirichlet process prior (Ferguson (1974)).
Sethuraman’s stick-breaking representation∑
wjδYjof prior
with w1 = u1, wj = uj(1− u1) · · · (1− uj−1) (j > 1). Here ujare i.i.d Beta(1, α(Θ)), where α is the base measure on Θ, Yj
are i.i.d ∼ G = α/α(Θ). Draws from posterior by MCMC.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
3 (a). NONPARAMETRIC BAYES ONMANIFOLDS-DENSITY ESTIMATIONA. Bhattacharya and D. Dunson (2010, 2012), BB (2012)
Density of Q with a standard measure on M, represented as amixture P(dθ) of a parametric family of densities K (x ; θ)(θ ∈ Θ)
f (x ;P) =
∫ΘK (x ; θ)P(dθ).
P is a probability measure on Θ, is often imposed with aDirichlet process prior (Ferguson (1974)).
Sethuraman’s stick-breaking representation∑
wjδYjof prior
with w1 = u1, wj = uj(1− u1) · · · (1− uj−1) (j > 1). Here ujare i.i.d Beta(1, α(Θ)), where α is the base measure on Θ, Yj
are i.i.d ∼ G = α/α(Θ). Draws from posterior by MCMC.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
3 (a). NONPARAMETRIC BAYES ONMANIFOLDS-DENSITY ESTIMATION
Example. A density on Σk2 is estimated by the kernel method
(KD) (Pelletier (2005)), NP Bayes and MLE. Simulationstudy yielded the following estimate of the mean L1 distanceof these methods: NP(0.44), KD (0.75), MLE (1.03).
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
3 (b). NONPARAMETRIC BAYES ONMANIFOLDS-CLASSIFICATIONS
Classifications. Σ82 (Gorilla Skulls). n1 = 30, n2 = 29. 25
randomly chosen from each group as the training samples.The remaining 9 were classified.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
Figure: Estimated shape densities of gorillas: Female(solid), Male(dot).Estimate(r), 95% C.R.(b,g).
−0.1 −0.05 0 0.05 0.1 0.150
1
2
3
4
5
6
7x 10
18
Predictive densities:Female(−), Male(..)
Densities evaluated at a dense grid of points drawn from the unitspeed geodesic starting at female extrinsic mean in direction ofmale extrinsic mean.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
REFERENCES
Arsigny, V., Fillard, P., Pennec, X., and Ayache, N. (2006).Magn. Reson. Med.
Bandulasiri, A., Bhattacharya, R. and & Patrangenaru, V.(2009). JMVA.
Bhattacharya, A (2008). Sankhya, A.
Bhattacharya,A & Bhattacharya,R. (2008). Proc. Amer.Math. Soc.
Bhattacharya,A & Bhattacharya,R. (2012). IMS MonographSeries #2.
Bhattacharya,A & Dunson. D. (2010). Biometrika.
Bhattacharya,A & Dunson. D. (2012). Ann. Inst. Statist.Math.
Bhattacharya, R. and Lin, L. (2016). Proc. Amer. Math. Soc.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
REFERENCES
Bhattacharya,R. & Patrangenaru, V. (2005). Ann. Statist.
Bhattacharya,R. & Patrangenaru, V. (2003). Ann. Statist.
Bhattacharya,R. & Lin, L. (2013).http://arxiv.org/abs/1306.5806
Bookstein, F. (1991). Cambridge University Press.
Dryden, I. L. and Mardia, K. V. (1998). Wiley, New York.
Dryden, I. L., Kume, A., Le, H., and Wood, A.T.A. (2008).Biometrika.
Fisher, R.A. (1953). Proc. Roy. Soc. London.
Ferguson, T. (1973, 1974). Ann. Statist.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
REFERENCES
Hotz, T., Huckemann, S., Le, H., Marron, J. S., Mattingly, J.C., Miller, E., Nolen, J., Owen, M., Patrangenaru, V., andSkwerer, S. (2013), Ann. Appl. Probab.
Irving, E. (1963), Wiley.
Hendriks, H. and Landsman, Z. (1996). CRA Acad. Sci.
Hendriks, H. and Landsman, Z. (1998). JMVA.
Karcher, H. (1977). Comm. Pure Appl. Math.
Kendall, D.G. (1984). Bull. London. Math. Soc.
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics
IntroductionFrechet Mean on Metric Spaces
Examples and ApplicationsNonparametric Bayes Theory on Manifolds.
REFERENCES
Kendall, K. S. and Le. H. (2011). Brazilian J. of Probab. andStatist.
Pellier, B. (2005). Statist. and Probab. Letters.
Schwarzman, A. (2014). To appear.
Sethuraman, J. (1994). Statist. Sinica.
Ziezold, H. (1977). Transactions of the Seventh PragureConference on Information Theory, Statistical Functions,Random Processes and of the Eightth European Meeting ofStatisticians
Rabi Bhattacharya, The University of Arizona, Tucson, AZ Analysis of Non-Euclidean Data: Use of Differential Geometry in Statistics