28
Applications of Algebraic Geometry IMA Annual Program Year Workshop Applications in Biology, Dynamics, and Statistics March 5-9, 2007 Information Geometry (IG) and Algebraic Statistics (AS) Giovanni Pistone (DIMAT Politecnico di Torino) http://staff.polito.it/giovanni.pistone/ [email protected]

Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Applications of Algebraic GeometryIMA Annual Program Year Workshop

Applications in Biology, Dynamics, and Statistics

March 5-9, 2007

Information Geometry (IG) and Algebraic Statistics (AS)

Giovanni Pistone (DIMAT Politecnico di Torino)http://staff.polito.it/giovanni.pistone/

[email protected]

1

Page 2: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Example: new cancer incidence and gender

• D. Geiger, C. Meek, and B. Sturmfels. On the toric algebra ofgraphical models. Ann. Statist., 34:1463–1492, 2006.

• F. Rapallo. Toric statistical models: Parametric and binomial repre-sentations. AISM, 2007. DOI 10.1007/s10463-006-0079-z

• G. Consonni and G. Pistone. Bayesian analysis of contingency tableswith possibly zero-probability cells. Technical Report 0703123, arXiv,2007

The following table reports data for different types of cancer separatedby gender for Alaska in year 1989 (Smirnoff 2003).

Type of cancer Female Male Total

Lung 38 90 128Melanoma 15 15 30Ovarian 18 * 18Prostate * 111 111Stomach 0 5 5

Total 71 221 292

Clearly cells (3, 2) and (4, 1) are structural zeros, while we regard thezero count corresponding to the combination (Stomach, Female)=(5, 1)

as a possibly zero-probability cell.

2

Page 3: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Quasi-independence

• A typical assumption that is of interest in this case is that of quasi-independence (QI), corresponding to the standard independence as-sumption for all sub-tables, excluding those having a structural zero.

• For this hypothesis, Simonoff (2003) finds a p-value between 2%and 3%, depending on the method employed. The data thus seemto provide significant evidence against the QI-model, although thisevidence is not very strong.

• Let I = {1, 2, 3, 4, 5}, J = {1, 2} denote the set of levels for the rowsand columns respectively, and consider the two-way table with cellsin the set A = I×J \{(3, 2), (4, 1)}, i.e. with cells (3, 2) and (4, 1) missing.

• Under the QI-model the un-normalized cell probabilities qij are givenby

qij = ρiψj, (i, j) ∈ A QI −model

• If qij > 0, (i, j) ∈ A,

log qi,j = αi + βj, (i, j) ∈ A

with αi = log ρi, βj = logψj ∈ R.

3

Page 4: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Exponential and extended exponential model

• The design matrix M and an orthogonal matrix K are:

M =

α1 α2 α3 α4 α5 β1 β2

11 1 0 0 0 0 1 0

21 0 1 0 0 0 1 0

31 0 0 1 0 0 1 0

51 0 0 0 0 1 1 0

12 1 0 0 0 0 0 1

22 0 1 0 0 0 0 1

42 0 0 0 1 0 0 1

52 0 0 0 0 1 0 1

, K =

k1 k2

11 1 0

21 −1 −1

31 0 0

51 0 1

12 −1 0

22 1 1

42 0 0

52 0 −1

• If qij > 0, (i, j) ∈ A, the QI-model is equivalent to implicit binomialmodel {

q11q22 − q21q12 = 0

q51q22 − q21q52 = 0.

The above equations are the standard conditions for independencein the two 2 × 2 tables with rows {1, 2}, respectively {2, 5}. This isequivalent to the independence of the sub-table {1, 2, 5} × {1, 2}

4

Page 5: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Maximal model

• The maximal design matrix Mmax and the maximal model in monomialform are computable whith Computer Algebra Software.

Mmax =

ζ1 ζ2 ζ3 ζ4 ζ5 ζ6 ζ711 0 0 0 0 1 0 1

21 0 0 1 0 0 0 1

31 1 0 0 0 0 0 0

51 0 0 0 1 0 0 1

12 0 0 0 0 1 1 0

22 0 0 1 0 0 1 0

42 0 1 0 0 0 0 0

52 0 0 0 1 0 1 0

q11 = ζ5ζ7

q21 = ζ3ζ7

q31 = ζ1

q51 = ζ4ζ7

q12 = ζ5ζ6

q22 = ζ3ζ6

q42 = ζ2

q52 = ζ4ζ6

• CoCoATeam. CoCoA: a system for doing Computations in Commu-tative Algebra. Available at http://cocoa.dima.unige.it.

• 4ti2 team. 4ti2 – a software package for algebraic, geometric andcombinatorial problems on linear spaces. Available at www.4ti2.de.

• The number of instances (also called feasable sets) for the QI-modelis 87.

5

Page 6: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Bayesian analysis

• The model which imposes no restriction on the cell probabilities,save the zero-probability cells (3, 2) and (4, 1), is called the StructuralZero model SZ. The number of SZ-instances is equal to 28− 1 = 255.

• Only two of the above SZ-instances are logically consistent with theobserved data: the one giving a positive probability to all eight freecells; and the one giving zero-probability to cell (5, 1) only. We labelthese instances SZ0 and SZ1.

• There exists only one logically consistent instance of the quasi-independence model, i.e. that having all positive cell-probabilities(except for the two cells corresponding to structural zeros), whichwe label QI0.

• The models SZ0 and SZ1 are nested, both in the sense of the maximalparameterization and of the supports (faces), so that their a-prioriDirichlet distributions on the parameters should be related. ZI couldbe parameterized by QI0 and an orthogonal component.

• The two models SZ0 and SZ1 are different exponential models andan a-priori on their union disintegrates into a discrete a-priori for themodel and a conditional Dirichlet given the model.

6

Page 7: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Statistical models as differentiable manifolds

• Statistical models have a very rich mathematical structure. We canapproach this from many points of view: Commutative Algebra,Convex Analysis, Differential Geometry.

• The Differential Geometry approach is the oldest: statistical mod-els are Riemaniann manifolds according: C. R. Rao. Informationand accuracy attainable in the estimation of statistical parameters.Bullettin of Calcutta Mathematical Society, 37:81–89, 1945.

• More recently, it has been found that there are other manifold struc-tures of interest, called by Amari α-Bundles and α-Connections.

• Most of the constructions in the literature are restricted to paramet-ric statistical models. However, there are applications where infinitedimensional statistical models appear, e.g. in Mathematical Financethe set of martingale measures on the Wiener space.

• A general coordinate-free construction is important at the conceptuallevel, because in such a framework the “big picture” comes out moreclearly, as in S. Lang. Differential and Riemannian manifolds, volume160 of Graduate Texts in Mathematics. Springer-Verlag, New York,third edition, 1995.

7

Page 8: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Exponential statistical manifolds

The theory of exponential statistical manifolds modeled on Or-

licz spaces, has been developed in a joint work with C. Sempi, M.-P.Rogantin, P. Gibilisco, A. Cena, D. Imparato, B. Trivellato (1993-2007).

• G. Pistone and C. Sempi. An infinite-dimensional geometric structureon the space of all the probability measures equivalent to a given one.Ann. Statist., 23(5):1543–1561, October 1995;

• G. Pistone and M. P. Rogantin. The exponential statistical man-ifold: mean parameters, orthogonality and space transformations.Bernoulli, 5(4):721–760, August 1999;

• P. Gibilisco and G. Pistone. Connections on non-parametric statis-tical manifolds by Orlicz space geometry. IDAQP, 1(2):325–347,1998;

• A. Cena and G. Pistone. Exponential statistical manifold. AISM,59:27–56, 2007. DOI 10.1007/s10463-006-0096-y.

The main focus is in finding a rigorous functional setting for the

IG as described in

• S. Amari and H. Nagaoka. Methods of information geometry. Amer-ican Mathematical Society, Providence, RI, 2000. Translated fromthe 1993 Japanese original by Daishi Harada.

8

Page 9: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Statistical varieties

In the case of a finite state space or in the case of special

distributions, IG has an interesting interface with AS. This interfaceis both conceptually interesting and computationally useful.

• P. Diaconis and B. Sturmfels. Algebraic algorithms for sampling fromconditional distributions. Ann. Statist., 26(1):363–397, 1998.

• G. Pistone and H. P. Wynn. Finitely generated cumulants. Statist.Sinica, 9(4):1029–1052, October 1999.

• G. Pistone, E. Riccomagno, and H. P. Wynn. Algebraic Statistics:Computational Commutative Algebra in Statistics. Chapman&Hall,2001.

• D. Geiger, C. Meek, and B. Sturmfels. On the toric algebra ofgraphical models. Ann. Statist., 34:1463–1492, 2006.

The key notion is that of exponential statistical model which

has long been known to have special features, both analytic and

algebraic.The study of general infinite dimensional exponential models suggests

how to avoid unnatural parameterization of the finite state space mod-els. In turn, the finite state space case suggests how to deal with limitcases.

9

Page 10: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Outline

Exponential statistical model

• Basic construction.

• Exponential and mixture connections.

• Models, divergence, tangent bundle.

Finite state space or special distributions

Extended exponential model

Tables with zero cells

10

Page 11: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

IG as a special Banach manifold

Given a general probability space (X,X , µ),

• M> is the set of all densities which are positive µ-a.s.

• M> is thought to be the “maximal” regular statistical model.

• We want to assign a manifold structure to this maximal model insuch a way that each specific statistical model could be consideredas a “sub-manifold” of M>. We will see that the notion of sub-manifold to be used is not obvious.

• The model space for the manifold, locally at each p ∈ M>, is asubspace of centered random variables of a suitable Orlicz space.Orlicz spaces are constructed in analogy with Lebesgue spaces, byimposing conditions of the form Ep [Φ(u)] =

∫Φ(u)pdµ < +∞, for a

suitable function Φ. If Φ(x) = |x|a, then the Lebesge La spaces areobtained.

A reference on Orlicz spaces is:

• M. M. Rao and Z. D. Ren. Applications of Orlicz spaces, volume 250of Monographs and Textbooks in Pure and Applied Mathematics.Marcel Dekker Inc., New York, 2002.

11

Page 12: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Orlicz spaces

• The Jung function Φ(x) = coshx− 1 is used instead of the equivalentand more commonly used e|x| − |x| − 1.

• Ψ denotes its conjugate Jung function or the equivalent (1+ |y|) log(1+

|y|)− |y|.

• A random variable u belongs to the vector space LΦ(p) if for someα > 0 Ep [Φ(αu)] < +∞.

• The closed unit ball of LΦ(p) consists of all u’s such that Ep [Φ(u)] ≤ 1.

• The open unit ball B(0, 1) consists of those u’s such that αu is in theclosed unit ball for some α > 1.

• The Banach space LΦ(p) is not separable, unless the sample space isfinite. In this sense it is an un-natural choice.

• However, LΦ(p) is natural for statistics because for each u ∈ LΦ(p)

the Laplace transform of u is well defined at 0 and the one-

dimensional exponential model p(θ) ∝ eθu is well defined around

0 (and viceversa).

• The space LΨ(p) is separable and it is the pre-dual of LΦ(p), withpairing Ep [uv]. For 1 < a < +∞, LΦ(p) ⊂ La(p) ⊂ LΨ(p).

12

Page 13: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Moment functional

• For each p ∈M>, the moment functional is Mp(u) = Ep [eu].

• Mp(0) = 1; otherwise, for each u 6= 0, Mp(u) > 1.

• Mp is convex and lower semi-continuous, and its proper domain

dom(Mp) ={u ∈ LΦ(p · µ) : Mp(u) <∞

}is a convex set which contains the open unit ball Bp(0, 1) of LΦ(p).

Th Mp is infinitely Gateaux-differentiable in the interior of its proper

domain, the nth-derivative at u ∈◦

dom(Mp) in the direction v ∈ LΦ(p)

beingdn

dtnMp(u + tv)

∣∣∣∣t=0

= Ep [vneu] ;

Th Mp is bounded, infinitely Frechet-differentiable and analytic on theopen unit ball of LΦ(p), the nth-derivative at u ∈ B(0, 1) evaluated in(v1, . . . , vn) ∈ LΦ(p)× · · · × LΦ(p) is

DnMp(u)(v1, . . . , vn) = Ep [v1 · · · vneu] .

In particular,DnMp(0) (v1, . . . , vn) = Ep [v1, . . . , vn] ·

13

Page 14: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Connected component

• We associate to each density p a space of p-centered random vari-ables: scores, estimating functions . . . . It is technically crucial todiscuss how the relevant spaces depend on the density p.

• Given p, q ∈ M>, the exponential model p(θ) ∝ p1−θqθ, 0 ≤ θ ≤ 1

connects the two given densities as end points of a curve. This

curve need not to be continuous. So we ask for more.

D We say that p, q ∈M> are connected by an open exponential arc

if there exist r ∈ M>, u ∈ LΦ0 (r) and an open interval I that contains

0, and such that p(t) ∝ etu ·r, t ∈ I, is an exponential model containingboth p and q.

Th Let p and q be densities connected by an open exponential arc.

Then the Banach spaces LΦ(p) and LΦ(q) are equal as vector

spaces and their norms are equivalent.

Th For all q that are connected to p by an open exponential arc, theOrlicz space of centered random variables at q is

LΦ0 (q) =

{u ∈ LΦ(p) |Ep

[q

pu

]= 0

}then Ep [∗ku] = 0, ∗k =

(qp − 1

), u ∈ LΦ(p).

14

Page 15: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Cumulant functional

• For each p ∈ M> and the u in the set Sp of p-centered randomvariables in the interior of the proper domain of Mp, the cumulantfunctional is Kp(u) = logMp(u).

• Kp is infinitely Gateaux-differentiable.

• Kp is bounded, infinitely Frechet-differentiable and analytic on theopen unit ball of LΦ

0 (p).

• The mapping

ep :

{Sp →M>

u 7→ eu−Kp(u)p

is a parameterization of a subset of M>.

• If ep (Sp) = E(p), the corresponding chart is

sp : E(p) 3 q 7→ log

(q

p

)− Ep

[log

(q

p

)]∈ Sp.

• If q = ep(u), then Kp(u) = Ep

[log(pq

)]and

DKp(u)(v) = Eq [v]

D2Kp(u)(v, w) = Eq [vw]

15

Page 16: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Exponential manifold

D For every p ∈ M, the maximal exponential model at p is definedto be the family of densities

E (p) :=

{eu−Kp(u)p : u ∈

◦domKp

}⊆M>.

Th The maximal exponential model at p, E(p) is equal to the set

of all densities q ∈ M> connected to p by an open exponential

arc.

Th The collection of charts {(Ep, sp) : p ∈M>} is an affine C∞ atlas

on M>. The transition maps are

sp2 ◦ ep1 : u 7→ u + log

(p1

p2

)− Ep2

[u + log

(p1

p2

)]• The chart domains are either equal or disjoint because they corre-spond to the connected components (in the sense of open arcs).

• E(p) is convex. reference measure.

• The derivative of the transition map sp2 ◦ ep1 is

Lφ0(p1) 3 u 7→ u− Ep2 [u] ∈ Lφ0(p2)

which is an isomorphism, because p1 and p2 are connected by an openexponential arc.

16

Page 17: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Divergence

• For each non-negative density q and each p ∈ M>, the divergence isdefined by D(q||p) = Eq [log q/p].

• Restricting to the manifold M>×M> and then to the proper domain,we can restrict to E(p)×E(p) and compute the representative in thatchart:

D(q1||q2) = Eq1 [u1 − u2]−Kp(u1) +Kp(u2)

Then, the divergence, as a real function on E(p) × E(p), is infinitelyGateaux-differentiable.

• Its partial derivatives are

D1D(q1||q2) · v = Covq1[u1 − u2, v]

D2D(q1||q2) · v = Eq2 [v]− Eq1 [v]

• In particular we can write

D(q||p) = Ep

[(q

p− 1

)u

]−Kp(u)

which shows that D(q||p), as a function of(qp − 1

), is the conjugate

of the convex function Kp. Equivalently,

D(q||p) +D(p||q) = Ep

[(q

p− 1

)u

]= Eq [u]

17

Page 18: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

The exponential geometry

The previous theory is intended to capture the essence and to gener-alize the idea of curved exponential model as defined by

• B. Efron. Defining the curvature of a statistical problem (with ap-plications to second–order efficiency). The Annals of Statististics,3:1189–1242, 1975. (with discussion)

• B. Efron. The geometry of exponential families. Ann. Statist.,6(2):362–376, 1978;

• A. P. Dawid. Discussion of a paper by Bradley Efron. The Annalsof Statistics, 3:1231–1234, 1975

• A. P. Dawid. Further comments on a paper by Bradley Efron. TheAnnals of Statistics, 5:1249, 1977

From the work of S.-I. Amari, we know that there is a second geometryon probabilities, whose geodesics are mixtures. This structure is aconnection on a special vector bundle of the M>-manifold. A relatedmanifold on the set M1 of normalized random variables is defined bycharts of the form

q 7→ q

p− 1

Locally, the q’s will have finite KL-divergence D(q||p).

18

Page 19: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Mixture manifold 1

• We enlarge M > considering the sets

M≥ =

{p ∈ L1 (µ) : p ≥ 0,

∫pdµ = 1

}M1 =

{p ∈ L1 (µ) :

∫pdµ = 1

}.

• For each p ∈M≥, we consider the subset of LΨ0 (p) defined by:

∗E(p) =

{q ∈ P :

q

p∈ LΨ (p)

}On such a domain, we define the charts

ηp :

∗E(p) → LΨ

0 (p)

q 7→ q

p− 1

and the associated parameterizations

LΨ0 (p) 3 u 7→ (u + 1) p ∈ ∗Up.

The collection of sets {∗E(p)}p∈M1is a covering of M1:

• If p ∈M, then E(p) ⊂ ∗E(p).

• If p1, p2 ∈ E(p), then ∗E(p1) = ∗E(p2).

19

Page 20: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Mixture manifold 2

Th The set of charts{(∗E(q), ηq) : q ∈ E (p)}

is an affine C∞-atlas on ∗E(p), so it defines a manifold modeled onthe Banach space LΨ

0 (p).

• For each pair p1, p2 ∈ E (p) the transition map is

ηp2 ◦ η−1p1

:

Lψ0 p1 → Lψ0 p2

u 7→ up1

p2+p1

p2− 1

Th For each q ∈M≥, the divergence D (q‖p) finite if and only if q ∈ ∗E(p):

D (q‖p) = Ep

[q

plog

(q

p

)]<∞ ⇐⇒ q ∈ ∗E(p)

Th For each density p ∈Mµ, the inclusion j : E (p) ↪→ ∗E (p) is of class C∞.

20

Page 21: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Exponential models

• Let V be a subspace of Lφ0. We call exponential model based on V

the “flat manifold”

EV (p) ={

eu−Kp(u)p |u ∈ V ∩ Sp}

• LetV ⊥ =

{k ∈ LΨ

0 (p) |Ep [ku] = 0, u ∈ V}

be the orthogonal space of V . If V is closed, then

q ∈ EV (p) ⇐⇒ Ep

[k log

q

p

]= 0, k ∈ V ⊥

• Note that V + V ⊥ in not a splitting of the space Lφ0(p). In fact, theproper notion of statistical model appears to be different from whatis usually termed a sub-manifold, because, in general, there is noorthogonal splitting of subspaces in LΦ(p). The proper spitting spaceconsists of the orthogonal space of the space of the model in thepre-dual space LΨ

0 (p).

• If we take k = qp − 1, q ∈∗ E(p), then the orthogonality becomes a

special case of the Csiszar pitagorean theorem

D(q‖p) +D(p‖q) = 0

21

Page 22: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Tangent space

• Let p(θ), θ ∈ I be a one-dimensional statistical model. If p(0) = p

and p(θ) ∈ E(p), then p(θ) = ep(θ)−Kp(p(θ))p. The tangent vector at θ iscomputed in the exponential chart centered at p as

Tp(θ)p(·) =d

dθu(θ) =

d

dθlog

(p(θ)

p

)+ Ep(θ)

[d

dθu(θ)

]and, at θ = 0,

Tpp(·) =d

dθu(θ)

∣∣∣∣θ=0

=d

dθlog

(p(θ)

p

)∣∣∣∣θ=0

• The exponential model

eθu′(0)−Kp(u

′(0))p, θ ∈ I

is the tangent exponential model.

• The statistical model

{q ∈ ∗E(p) |Eq [u′(0)] = 0}

is the orthogonal mixture model.

22

Page 23: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Classical exponential models

• A classical exponential models is

p(x; θ) = exp

(d∑i=1

θiTi(x)− Ψ(θ)

), θ ∈ Θ

where Ψ = T ∗µ is the Laplace transform of the distribution of T andΘ is its domain.

• If H = Span (1, T1, . . . , Td), the membership of pθ in the model is equiv-alent to

log p(x; θ) =

d∑i=1

θiTi(x)− Ψ(θ) ∈ H

• The coordinate in the exponential manifold is

s(pθ) =

d∑i=1

θi

(Ti(x)− ∂

∂θiΨ(θ)

)and the cumulant function is

K(u) = Ψ(θ)−d∑i=1

∂θiΨ(θ)

23

Page 24: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Sufficiency

• The supporting probability space can be reduced by sufficiency. Infact, if T = σ (T1, . . . , Td) is the sufficient σ-algebra, it suffices to thesmaller probability space

(T (X), T , µ|T

).

• Assume the all the Ti’s take a finite number of values. Then, thespace of measurable functions which are functions of T is the algebragenerated by T1, . . . , Td. If a smaller number of generators can befound, this will lead to a reduction of the dimension of the modelspace.

• The computation of the sufficient algebra, of a generating set of thisalgebra, and of a linear basis, are all relevant, because all statisticbased on the likelihood will be a member of this algebra.

• More specifically, assume that the original sample space is finiteand that all values of the Ti are rational. Then, we can map eachprobability of the original model to a finite subset D of Qd to obtainthe equivalent exponential model

p(t; θ) = exp

(d∑i=1

θiti − Ψ(θ)

)T ∗µ(t), t ∈ D

24

Page 25: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Design theory

• In algebraic Design Theory, the points of a finite state space (ortreatment space) are labelled with points in Qd. In turn, the samplespace is described as a 0-dimensional algebraic variety of an ideal I ofthe ring R = Q[x1 . . . xd]. The algebra of real functions on the samplespace is described by the quotient ring R/I and has a hierarchicallinear basis of monomial. Other labels have proved to be useful, inparticular the n-roots of the unity.

• This same approach should lead to a useful algebraic presentation ofthe sufficient algebra of exponential models on a finite sample space.

• G. Pistone, E. Riccomagno, and H. P. Wynn. Algebraic Statistics:Computational Commutative Algebra in Statistics. Chapman&Hall,2001

• G. Pistone and M. Rogantin. Indicator function and complex cod-ing for mixed fractional factorial designs (revised 2006). TechnicalReport 17, Dipartimento di Matematica, Politecnico di Torino, 2006

• H Maruri-Aguilar, R Notari and E Riccomagno. “On the descriptionand identifiability analysis of mixture designs” (Accepted for publi-cation in Statistica Sinica 2007)

25

Page 26: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Example: Exponential versus toric

• X is a finite sample space and p is a given probability density. Inparticular, a multi-way contingency table identified by a collection offactors X = {X1, . . . , XF}. If If denotes the set of levels for the factorXf , f = 1, . . . , F , the state space is a product space, i.e X = ×F

f=1If .

• A log-linear model assumes that p(x) > 0 and that log p(x) belongs toa linear subspace H of RX . If H is spanned by {T1, . . . , Ts}, where theTj’s are integer valued functions, we can write the log-linear modelas

log p(x) =

s∑j=1

(log ζj)Tj(x),

• The unnormalized probability q satisfies the same equation

q(x) = ζT1(x)1 · · · ζTs(x)

s , ζj ≥ 0, j = 0, . . . , s,

where the parameters ζ1, . . . , ζs are subject to non-negativity con-straint only.

26

Page 27: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Example: orthogonality

• A third expression of the same model can be derived by elimination ofthe indeterminates ζ1, . . . , ζs in the monomial parameterization equa-tion. In fact, if M =

[T1(x) · · · Ts(x)

]x∈Q is the design matrix, the

orthogonal space of its range can be generated by integer valuedvectors with zero sum K =

[k1 · · · kr

], then∏

x

q(x)kj(x) =∏x

(ζT1(x)1 · · · ζTs(x)

s

)kj(x)

= ζT1(x)·kj(x)1 · · · ζTs(x)·kj(x)

s = 1,

• As the sum of the elements of each kj, j = 1, . . . , r, is zero, the sumof the elements of both the positive part k+

j and the negative partk−j are equal, so that we could write∏

x

q(x)kj(x)+ −

∏x

q(x)kj(x)−

= 0, j = 1, . . . , r.

• It follows that the toric model implies a set of r binomial and homo-geneous equations in the un-normalized probabilities q(x), x ∈ Q.

– Given a finite state log-linear model and all its limit points, aspecific set of configurations of zero-probability cells arises. Thisset cannot be recovered by setting to zero some parameters in ageneric toric parametric representation.

27

Page 28: Applications of Algebraic Geometry IMA Annual Program Year ... › ... › 2648 › talk-final.pdf · A reference on Orlicz spaces is: • M. M. Rao and Z. D. Ren. Applications of

Example: instances

• However, there exists a “maximal” parametric toric representationthat gives a full parameterization of the extended model.

1. All toric models compatible with the implicit binomial model arecharacterized by a string of T ’s exponents, which is a non-negativeinteger vector orthogonal to the vectors [k1 . . . kr].

2. The lattice of non-negative integer vectors t ∈ NQ+ such that the

condition t · kj = 0 holds for each j = 1, . . . , r, has a finite numberof generators that can be computed with symbolic software.

3. If the generators are S1, . . . , Su, then the “maximal” toric model is

q(x) = ζS1(x)1 · · · ζSu(x)

u x ∈ Q.• The support of the resulting probability will be the set Q1 = {x ∈ Q :

S1(x) = 0}. On such a restricted support, the model will be againtoric:

q(x) = ζS2(x)2 · · · ζSu(x)

u x ∈ Q1.

or exponential if all the other parameters ζ2, · · · , ζu are assumed tobe strictly positive.

• In this sense, we say that each toric model is a union of ex-

ponential models with different supports. Each one of these

models is called an instance of the model.

28