Page 1 Algebraic Methods in Statistics Suree Chooprateep, Jinhua Fang, Joe Masaro, and Chi Song...

Preview:

Citation preview

Page 1

Algebraic Methods in Statistics

Suree Chooprateep, Jinhua Fang, Joe Masaro,

and Chi Song Wong*

Algebraic Methods in Statistics

Page 2

We shall use an example (or model) in

weighting designs to illustrate the ideas

involved and give no non-trivial proofs.

We gave many such talks since 1982; the

transparencies were hand written by Joe

Masaro who, at the time, was my Ph.D.

student.

Algebraic Methods in Statistics

With some modifications, I used these

transparencies until Suree used power point in 2007

to prepare a new one. Before that, the longest one

using transparencies was the invited talk given in a

conference at Fields Institute, Toronto, October 25-

30, 1999.

p objects 1,2,...,p with weights b1,b2,…,bp are being weighed on a chemical balance n times.

Page 4

The net weight of the ith weighing is

Where is -1, 1 or 0 according as, at the ith weighing,

the object j is placed on the left, right or not weighed.Page 5

ippiii xbxbxb ...2211

ijx

• is the reading of (before it is observed)

and

is the error in reading and we assume that the error ’s are normally distributed independent random variables

Page 6

iY

iii Y

i

i

i

• is a linear model;

is referred to as a design or a design matrix.

Page 7

XbY nIXbN 2,YX

With a given design

Page 8

X

we shall use the blue of b to estimate b:(blue:best linear unbiased estimator) YXXXYbX 1ˆ

Our design X is restricted to those with rank p. A design is said to be “optimal” if among all , is the “best” (estimator for ).

• The word “optimal” will soon be clarified to some extent by the following -level

confidence ellipsoid for b:

Page 9

bpnCX ,

YpfbYbXXbYb pnpXX2

,, ˆˆˆ

pnCX ,0 )(ˆ Yb

oX

• is “best” if the size of ellipsoid

• is the “smallest”

• The “size” of is determined by

Page 10

)(ˆ0Ybx

100 )( XX

E

1)( XXE 1)( XX

Size of ellipsoids

• (5)

• (6)

• (7)

Page 11

1241 2

22

2

21

xx

1156 2

22

2

21

xx

11111 2

22

2

21

xx

Page 12

Let C be a set of ellipsoids:

Page 13

11 axBax

1 XXB

Yba Xˆ

D-optimal design

• is said to be D-as small as if

the volume of is equal to or

less than the volume of

• is D-optimal if the volume of the

corresponding with , = is of the smallest. Page 14

1BE

2BE

2BE

1BE

1 XXB

0X

X oXBE

E-optimal design

• is said to be E-as small as if

the maximum eigenvalue of is equal to or less than the maximum eigenvalue of .

• is E-optimal if the maximum eigenvalue of the corresponding with = is of the smallest.

Page 15

1BE

2BE

1 XXB

0X

X oX

1B

2B

A-optimal design (page 15)

• is said to be A-as small as if

the trace of B1 is equal to or less than the

trace of B2

• is A-optimal if the trace of the corresponding with = is of the smallest.

Page 16

1BE

2BE

0X

1 XXB X oX

Optimal designs or optimality Our contributions in optimal designs or optimality

can be found in:

• Masaro, J. and Wong, C. S. (1992). Type I optimal weighing designs when N=3 (mod 4), Untilitas Mathematical 41, 97-107.

• Wong, C. S. and Masaro, J.(1991). On the optimality properties of A-optimal designs, Pure and Anal. Math. 7, 1-12.  

Page 17

Optimal designs or optimality

• Chi Song Wong. Extensions of tr((A-1-B-1)(A-B))< 0 for covariance matrices A,B, J. Linear Alg. Appl., 127 (1990), 517-529.

• Chi Song Wong. On the use of differentials in Statistics, J. Linear Alg. Appl. 70(1985),285-299. Statist. 8(1980), 103-113.

• Ching-Shiu Cheng, Joe Masaro and Chi Song Wong, Optimal  weighing designs, SIAM J. Alg. Disc. Meth. 6(2)(1985), 259-267. 

Page 18

Optimal designs or optimality

• Ching-Shiu Cheng, Joe Masaro and Chi Song Wong, Do nearly balanced graphs have more spanning trees? Graph Theory 8 (1985), 342-345.

 • Chi Song Wong and Joe Masaro, A-optimal design matrices, Discrete

Math. 50(1984), 295-318.

• Chi Song Wong and Joe Masaro, A-optimal design matrices X=(Xij) with Xij = -1, 0, 1, Linear and Multilinear Algebra 15(1984), 23-46.

 

Page 19

Optimal designs or optimality

• M. Jacroux, C. S. Wong and Joe Masaro, On the optimality of chemical balance weighing designs. J.statist. Plann Inference 8(1983), 231-240.

• Chi Song Wong, Matrix derivatives and its application in Statistics, J. Math. Psychology 22(1980), 70-81. 

• Chi Song Wong and Kai Sang Wong, A first derivative test for the maximum likelihood estimates, Bull. of the Inst. of Math., Academia Sinica, 7(1979), 313-321.

 

• Chi Song Wong and K. S. Wong, Minima and maxima in multivariate analysis, Canada. J. Statist. 8(1980), 103-113 37. N.

• N. Chan and Chi Song Wong, Existence of an A-optimal model for a regression experiment, J. Math. Anal. Appl. 11(1980), 403-415.

  • D. S. Chang and C. S. Wong, Correction to “Design

of optimal control for a regression problem”, Ann. Statist. 8(1980), 1403.

 

Page 21

Optimal designs or optimality

Optimal designs for correlated random variables

• Masaro, J. and Wong, C. S. (2008), Robustness of A-

optimal designs for correlated random variables, Linear Algebra Appli., 429, 1392-1408.

 • Masaro, J. and Wong, C. S. (2008), D-optimal designs

for correlated random variables, J. Statist. Plann. Inference, 138, 4093-4106.

• Masaro, J. and Wong, C. S. (2008), Robustness of optimal designs for correlated random variables, Linear Algebra. Appli., 429, 1639-1646.

Page 22

Hadamard matrix

• A-, D- and E-optimality criteria are not

equivalent. However, in , if there exists

a matrix such that

• When , such an is called a

Hadamard matrix of order n and its

existence implies that n=1, 2 or multiple 4.

Page 23

pnC ,

nIpXX

pn X

Majorization• Majorization or equivalently, redistribution

of wealth, is important for obtaining optimal designs and many inequalities.

• Majorization is a branch of mathematics in its own right and is based on linear and nonlinear functional analysis; see, e.g.,

Page 24

Majorization

• Wong, C. S. and Cheng H. (1997). Vector majorization via non-negative definite doubly stochastic matrices of maximum rank, Linear Alg. Appl. 261, 187-194.

• Chao, K. and Wong, C. S. (1992). Applications of M- matrices to majorization, Linear Alg. Appl. 169, 31-40.

 • Chen, L. and Wong, C. S. (1992). Inequalities for

singular values and trace, Linear Alg. Appl. 171, 109-120.

  Page 25

Majorization

• Chi Song Wong. (1990). Extensions of tr((A-1-B-1)(A-B))< 0 for covariance matrices A,B, J. Linear Alg. Appl., 127, 517-529.

• Wong, C. S. (1986) Modern Analysis and Algebra. Xidan University Press, Xian.

Testing hypothesis

• For a linear model Y, we are interested in testing a hypothesis H0: K’b=b0 or, assume that b0=0 via a translation,where K’b is estimable, i.e., we can choose K’ such that K’= T’X ( K’=TX’X) for some matrix T or equivalently, K’b is estimable, i.e., K’b can be expressed as a linear combination of the mean of Y or, the row space of K is included in the row space of X.

Page 27

Testing hypothesis

• Here we no longer assume that X has full column rank, i.e., b may not be estimable.

There are many equivalent conditions for estimability:

• Chi Song Wong(1989). Linear models in a general

parametric form, Commun. in Statist.-Theory and Methods 18(9), 3095-3115.

• Wong, C. S. (1993). Linear models in a general parametric form, Sankhya A 55, 130-149.

 

Testing hypothesis

• When S and S0 are given via the design matrix X with some constraints such as K’b=a for Xb), there exist formulae to calculate Proj(y to S) and Proj(y to S0) and check estimability; see Wong, C. S. (1993) and  

• Wong C. S. and Cheng, H. (2001). Estimation in a growth curve model with singular covariance, J. Statist. Plann. Inference, 97, 323-342.

Page 29

Least squares

Let y = OP be a realization of Y and OA be the orthogonal projection of y. Then

y =OA+AP and AP is perpendicular to the column space S

of X, i.e., (y-OA)’Xb=0 for every b in the p-dimensional Euclidean column space or equivalently,

(y-OA)’X=0, Page 30

Least squares

• X’(y-OA)=0,

• X’y=X’OA,

• X’y =X’Xb(y) for some b(y),

• b(y)=G(X’X)X’y, (normal equation)where G(X’X) is a generalized inverse of X’X,

i.e., upon multiplying X on the left,

Page 31

Least squares

OA=Py,

Where

P=XG(X’X)X’. (matrix representation [Proj( y to S)] of Proj( y to S) without constraint) Thus P=XG(X’X)X’ is independent of the choice of G(X’X) and is a symmetric idempotent.

We shall refer the use of the above matrix

representation with or without constraints to:

Page 32

Least squaresWong C. S. and Cheng, H. (2001). Estimation in

a growth curve model with singular covariance, J. Statist. Plann. Inference, 97, 323-342.

Chi Song Wong (2000). Probability. Hunan Science and Technology Publisher.

Now , 

the square of ||A’A||=Q(b(y)),

Page 33

Testing hypothesis

where Q(b)=<(K’(b)), G(KG(X’X)K’))K’(b)>,

and < . > is the usual inner product.    Some of my students   published my results or  

even wrote a Ph.D. dissertation based on special cases of results in Masaro and Wong   (1999).

Page 34

Testing hypothesis• To illustrate the F-test, let S be the column

space of X and S0 be the space of all Xb in S that satisfy H0. Since S0 is S under H0, it seems reasonable to reject H0 if,

the square of length ||AA’ || is large enough (compared with ||PA||), where O is the origin, vector OP=y, vector PA (PA’), denoted by Proj(OP to S) (Proj(OP to S0)), is perpendicular to S (S0) with A in S and A’ in S0.

Page 35

Testing hypothesis   Thus vector AA’ =Proj(y to orthogonal complement of

S0 in S) and we reject H0 when• ||A’A|| , Q= the square of ||AA’|| or ||AA’||/||PS|| is large enough. This gives rise to the usual F test: reject H0 if f> f0, where • f=c square of ||AA’||/||PS||

and c is a constant determined by the given first type risk alpha. Here we note that, up to scale,

the square of ||AA’|| ( ||PA|| )

has a chi-square distribution of s (n-r) degrees of freedom, where s (r) is the rank of K’ (X).

Page 36

Testing hypothesis

• f=c square of ||AA’||/||PS||

and c is a constant determined by the given first type risk

alpha. Here we note that, up to scale,

the square of ||AA’|| ( ||PA|| )

has a chi-square distribution of s (n-r) degrees of freedom, where s (r) is the rank of K’ (X).

Testing hypothesis• Now, we are interested in hypotheses

H0i: Ki’b= ai, i=1,2,…,k. Let Qi be the corresponding Q to hypothesis H0i, often, we choose ai’s to be orthogonal contrasts so that Qi’s are (stochastically) independent. This leads to the theory of analysis of variance (ANAOVA) or more generally,

Page 38

Testing hypothesis

Multivariate Analysis of Variance (MANOVA) : decompose a chi-squared (Wishart) Y’WY into a finite sum of independent chi-quared (Wishart) Y’WIY, i=1,2,…,k.

      The first result was due to Cochran (1934): In the above setting, assume that each Wi is symmetric,

I=W1 + W2+…+Wk and sigma =1. Then

Page 39

Cochran theorem Y’WiY’s are independently chi-squared

distributed if and only if(1) each Wi is a projection (WiWi=Wi) and(2) WiWi’=0 for distinct I, I’.

    Cochran theorem were generalized by numerous authors; see, e.g.,

Page 40

Cochran theorem

• Masaro, J. and Wong, C. S. (2003), Wishart distributions associated with matrix quadratic forms, J. Multivariate Anal. 85, 1-9.

• Wong, C. S. and Cheng, H. (1999). Cochran theorems for a multivariate elliptically contoured models II, J. Statist. Plann. Inference, 79, 299-324.

  •

Cochran theorem

Wong, C. S., Cheng, H. and Masaro J. (1998). Multivariate versions of Cochran theorems, Linear Algebra and Appl. 291, 227-234.

• Wong, C. S. and Cheng, H. (1998). Cochran theorem to elliptically contoured distributions of gamma type, Sankhya B, 60, 407-432.

Page 42

Cochran theorem

Wang, T., Fu, R. and Wong, C. S. (1996). Cochran theorems for multivariate components of variance models, Sanhkya A 58, 328-342.

• Wong, C. S. and Wang, T. (1995). Laplace-Wishart distributions and Cochran theorems, Sankhya A 57, 342-359.

Cochran theorem

Wang, T. and Wong, C. S. (1995). Cochran theorems for a multivariate elliptically contoured models, J. Statist. Plann. Inference, 43, 257-270.

Wong, C. S., Masaro, J. and Deng, W. (1995). Estimating  covariance in a growth curve model, Linear Alg. Appl. 214, 103-118.

Cochran theorem

Wong, C. S. and Wang, T. (1993). Multivariate Versions of Cochran Theorems II, J. Multivariate Anal. 44 (1993) 146-159. 

Wong, C. S., Masaro, J., and Wang, T. (1991). Multivariate versions of Cochran’s theorem, J. Multivariate Anal. 39(1991), 154-174.

Chi Song Wong, Characterizations of products of symmetric matrices, Linear Alg. Appl. 42(1982), 243-251.

Page 45

Jordan algebra Since Wi’s are symmetric, it is

desirable   that the underlying theory does not go beyond the set Sn of symmetric matrices; thus   the usual matrix multiplication is replaced by a Jordan product o: For A, B in Sn,

AoB=(AB+BA)/2,

Page 46

Jordan algebra

   or more generally, for some C in Sn,

AoB=(ACB+BCA)/2

for all A, B in Sn.   To avoid ambiguity, we may   index o and refer it as the Jordan product induced by c.) Then the usual Sn with o and

it usual vector space structure is referred to as a Jordan algebra,   where o is commutative but no longer associative.

Page 47

Jordan algebra

Historically, the   general notion of Jordan algebra in mathematics is motivated by

the above (classical) Jordan algebra , see, e.g.,  

J. Faraut, A. Koranyi(1994). Analysis of Symmetric Cones , Oxford University Press.

Cochran theorem

•    We shall now present the general Cochran theorem, Theorem 4.8, in Masaro and Wong(2010) without proof:

• For multivariate n by p normal Y and quadratic Y’WY with W symmetric, note that phi with phi(t)= W tensor t is linear in Sp. Consider the corresponding quadratic Q:

Cochran theorem

• Q(t)= <Y, phi(t)(Y)>.

Assume the covariance of Y to be identity: Then:

• Y’WY is Wishart(m, Sigma) for some (m,Sigma) if and only if phi is a Jordan algebra homomorphism (with respect to the Jordan structure induced by Sigma).

Cochran Theorem

• Instead of one W, we are given a finite set of W1, W2, …, Wk in Sn. Then indexing phi via phik and Q via Qk, we obtain:

• Y’WiY is Wishart(mi, Sigma) for some (mi,Sigma) if and only if phii is a Jordan algebra homomorphism.

• More over, {Y’WkY} is independent if and only if phii(Sigma+)phii’(Sigma+)=0 for distinct i, i’.

Page 51

Cochran theorem

The above result is equivalent to but more transparent than the Cochran theorem in

1. Masaro, J. and Wong, C. S. (2003), Wishart distributions associated with matrix quadratic forms, J. Multivariate Anal. 85, 1-9.

15. Wong, C. S. and Wang, T. (1993). Multivariate Versions of Cochran Theorems II, J. Multivariate Anal. 44 (1993) 146-159.

Cochran theorem

  More, we can generalize it to a much more general result by replacing Wishart to Wishart-Laplace and by replacing real Y to complex and quaternion Y with a general covariance structure:

Masaro, J. and Wong, C. S. (to appear in 2010),

Wishart-Laplace distributions associated with matrix quadratic forms, J. Multivariate Anal. (11 printed pages)

 

Page 53

Cochran theorem

Masaro, J. and Wong, C. S. (2010), Characterization of Wishart-Laplace distributions via Jordan algebra homomorphisms, Linear Algebra Appli., 432, 1578-1594.

Masaro, J. and Wong, C. S. (2009), Wishartness of quadratic forms: a characterization via Jordan algebra representations, Tamsui Oxford J. Mathematical Sciences 25(1) 87-117.

 

Page 54

Cochran theorem

For the first paper on multivariate Laplace-Wishart distributions, see

  Wong, C. S. and Wang, T. (1995). Laplace-Wishart

distributions and Cochran theorems, Sankhya A 57, 342-359.

Thus among many other Cochran theorems, it

generalizes the Cochran theorem in

Cochran theorem

15. Wong, C. S. and Wang, T. (1993). Multivariate Versions of Cochran Theorems II, J. Multivariate Anal. 44 (1993) 146-159.

  More, we can generalize it to a much more general result by replacing Wishart to Wishart-Laplace and by replacing real Y to complex and quaternion Y with a general covariance structure.

Page 56

Epilogue

• Some statisticians are also probabilists or pure mathematicians who assume that the model is given and are interested in generalizations, difficult problems, simplicity (beauty) and its relation to various problems;

Page 57

Epilogue

• some statisticians are interested in fitting data to a model, simulate data according to a given model and analyse data through computer packages.

• Without politics (using power to ‘squeeze’), the above statements are neutral.

Epilogue• I got my Ph.D. degree from University of

Illinois at Urbana. When I took my first course in statistics, I was equipped with measure theory and functional analysis to the level of Naimark’s book on Normed Rings. My teacher taught me statistics without mentioning measurability and I could not understand the meaning of random variables.

Epilogue • In the second semester, I took E. L.

Lehmann’s book on Testing Hypotheses from R.A. Wijsman. During my classroom presentation, he said I was sloppy, demanding a proof when I said

(1) a certain function is weak-star continuous and(2) for a certain picture (or Hahn Banach theorem argument), a certain flat separates two convex sets.

Page 60

Epilogue

I asked to delay my presentation and was granted. The class soon shrank to five students.

• After the mid-term examination papers were returned and the sum of class marks was announced, I knew I stood first because my marks exceeded half of the class sum.

Epilogue

I got a grade of A for the course but had very little feeling about statistics until I started to teach Decision Theory from Ferguson’s (1967) book and tried to make my lectures rigorous. For some time, I got the following comments from:

Page 62

Epilogue

• (1) a professor in Statistics:Chi Song’s heart is in mathematics and his foot is in statistics;

(2) a Ph. D. student representative in statistics:

Dr. Wong’s course in statistics is like a probability course and his probability course is like a measure theory course.

Epilogue• See my books to decide if you agree:

A. Chi Song Wong (2000), Probability (assume measure theory), Hunan Science and Technology Publisher.B. Chi Song Wong (1979, 1980). Mathematical Statistics,(3 volumes, assume functional analysis), Tamkang University Press.

Epilogue

Thank for listening!

*Speaker

-----The End-----

yx yx1)'( xx