18
Statistics 550 Notes 8 Reading: Section 1.6.1-1.6.4 I . Correction on Minimal Sufficiency The statement of Theorem 2 in Notes 7 was wrong. The correct statement is Theorem 2 (Lehmann and Scheffe, 1950): Suppose is a sufficient statistic for . Also suppose that if for two sample points and , the ratio is constant as a function of , then . Then is a minimal sufficient statistic for . Proof: Let be any statistic that is sufficient for . By the factorization theorem, there exist functions and such that . Let and be any two sample points with . Then . 1

notes8

Embed Size (px)

DESCRIPTION

d

Citation preview

Statistics 550 Notes 3

Statistics 550 Notes 8

Reading: Section 1.6.1-1.6.4I. Correction on Minimal Sufficiency

The statement of Theorem 2 in Notes 7 was wrong. The correct statement is

Theorem 2 (Lehmann and Scheffe, 1950): Suppose is a sufficient statistic for . Also suppose that if for two sample points and , the ratio is constant as a function of , then . Then is a minimal sufficient statistic for .

Proof: Let be any statistic that is sufficient for . By the factorization theorem, there exist functions and such that . Let and be any two sample points with . Then

.

Since this ratio does not depend on , the assumptions of the theorem imply that . Thus, is at least as coarse a partition of the sample space as , and consequently is minimal sufficient.

Example 1: Consider the ratio

.

If this ratio is constant as a function of , then we must have . Since we have shown that is a sufficient statistic, it follows from the above sentence and Theorem 2 that is a minimal sufficient statistic.

II. Exponential Families

The binomial and normal models exhibited the interesting feature that there is a natural minimal sufficient statistic whose dimension is independent of the sample size. The exponential family models are a general class of models that exhibit this feature.

The class of exponential family models includes many of the mostly widely used statistical models (e.g., binomial, normal, gamma, Poisson, multinomial). Exponential family models have an underlying structure with elegant properties that we will discuss.

One-parameter exponential families: The family of distributions of a model is said to be a one-parameter exponential family if there exist real-valued functions such that the pdf or pmf may be written as

MACROBUTTON MTPlaceRef \* MERGEFORMAT (0.1)

Comments:

(1) For an exponential family, the support of the distribution (i.e., ) cannot depend on . Thus, iid Uniform is not an exponential family model.

(2) For an exponential family model, is a sufficient statistic by the factorization theorem.

(3) are not unique. For example, can be multiplied by a constant c and T can be divided by the same constant c. Examples of one-parameter exponential family models:

(1) Poisson family.

Let . Then for ,

.

This is a one-parameter exponential family with

.

(2) Binomial family.

Let . Then for ,

This is a one-parameter exponential family with

The family of distributions obtained by taking iid samples from one-parameter exponential families are themselves one-parameter exponential families.

Specifically, suppose and is an exponential family, then for iid with common distribution ,

A sufficient statistic is and it is one dimensional whatever the sample size n is.

For iid Poisson (), the sufficient statistic has a Poisson () distribution and hence has an exponential family model. It is generally true that the sufficient statistic of an exponential family model follows an exponential family.

Theorem 1.6.1: Let be a one-parameter exponential family of discrete distributions:

Then the family of the distributions of the statistic is a one-parameter exponential family of discrete distributions whose pdf may be written

for suitable h*.

Proof: By definition,

If we let , the result follows.

A similar theorem holds for continuous exponential families.

A useful reparameterization of the exponential family model is to index as the parameter to yield

, MACROBUTTON MTPlaceRef \* MERGEFORMAT (0.2)

where in the continuous case and the integral is replaced by a sum in the discrete space. If , then must be finite. Let . The model given by (0.2)

with GOTOBUTTON ZEqnNum609652 \* MERGEFORMAT ranging over is called the canonical one-parameter exponential family generated by T and h. is called the natural parameter space and T is called the natural sufficient statistic. The canonical one-parameter exponential family contains the one-parameter exponential family (0.1)

with parameter space GOTOBUTTON ZEqnNum624177 \* MERGEFORMAT and can be thought of as the biggest possible parameter space for the exponential family.

Example 1: Let . Then for ,

MACROBUTTON MTPlaceRef \* MERGEFORMAT (0.3)

Letting , we have

.

We have

Thus, .

Note that if , then (0.3)

would still be a one-parameter exponential family but it would be a strict subset of the canonical one-parameter exponential family generated by T and h with natural parameter space GOTOBUTTON ZEqnNum507954 \* MERGEFORMAT .

A useful result about exponential families is the following computational shortcut for moments of the natural sufficient statistic:

Theorem 1.6.2: If X is distributed according to (0.2)

and GOTOBUTTON ZEqnNum609652 \* MERGEFORMAT is an interior point of , then the moment-generating function of exists and is given by

for s in some neighborhood of 0.Moreover,

.

Proof: This is the proof for the continuous case.

because the last factor, being the integral of a density, is one. The rest of the theorem follows from the moment generating property of (see Section A.12 of Bickel and Doksum).

Comment on proof: In order for the moment generating function (MGF) properties to hold, the MGF must exist (be less than infinity) for s in some neighborhood of 0. The proof that the MGF exists for s in some neighborhood of 0 relies on the fact that is an interval or , which is established in Section 1.6.4.

Example 1 continued: Let . The natural sufficient statistic is and , . Thus, using Theorem 1.6.2,

Example 2: Suppose is a sample from a population with pdf

This is known as the Rayleigh distribution. It is used to model the density of time until failure for certain types of equipment. The data comes from an exponential family:

Here .

Therefore, the natural sufficient statistic has mean and variance .

Proving that a one parameter family is not an exponential family

A one parameter exponential family is a family

, .Consider a one parameter family . If the support of is different for different , then the family is not an exponential family because if and only if .

Suppose that the support of is the same for all . We can write the pdf or pmf of the family as

.

In order for this to be an exponential family, we need to be able to write

MACROBUTTON MTPlaceRef \* MERGEFORMAT (0.4)

for some functions .

Suppose (0.4)

holds. Then for any two sample points GOTOBUTTON ZEqnNum273003 \* MERGEFORMAT and ,

andfor any four sample points , , , ,

is constant as a function of .

Thus, a necessary condition for a one-parameter exponential family is that for any four sample points,

, , , ,

must be constant as a function of .

Proof that the Cauchy family is not an exponential family:

The Cauchy family is

Thus, for the Cauchy family,

.

For any four sample points ,

This is not constant as a function of so the Cauchy family is not an exponential family.

II. Multiparameter exponential familiesOne-parameter exponential families have a natural one-dimensional sufficient statistic regardless of the sample size. A k-parameter exponential family has a k-dimensional sufficient statistic regardless of the sample size.

The family of distributions of a model is said to be a k-parameter exponential family if there exist real-valued functions of such that the pdf or pmf may be written as

MACROBUTTON MTPlaceRef \* MERGEFORMAT (0.5)

By the factorization theorem, is a sufficient statistic.

Example 1: Suppose is iid . Then

which corresponds to a two-parameter exponential family with .

Example 2: Multinomial. Suppose we observe n independent trials where each trial can end up in one of k possible categories {1,...,k} with probabilities . Let be the number of outcomes in categories 1,...,k in the n trials. Then,

The multinomial is a (k-1) parameter exponential family with , and . Moments of Sufficient Statistics: As with the one-parameter exponential family, it is convenient to index the family by . The analogue of Theorem 1.6.2 that calculates the moments of the sufficient statistics is Corollary 1.6.1:

Example 2 continued: For the multinomial distribution,

. Curved Exponential Families:

A curved exponential family is a family

for which .

An exponential family for which is a full exponential family.

Example of a curved exponential family: .

This is an exponential family with . The parameter space is a curve:

PAGE 1

_1189915309.unknown

_1189917586.unknown

_1190572303.unknown

_1220975867.unknown

_1221334895.unknown

_1221337420.unknown

_1221337458.unknown

_1221337475.unknown

_1221337483.unknown

_1221337430.unknown

_1221336105.unknown

_1221336226.unknown

_1221336398.unknown

_1221336593.unknown

_1221336286.unknown

_1221336144.unknown

_1221336061.unknown

_1221332492.unknown

_1221332704.unknown

_1221334361.unknown

_1221332675.unknown

_1221331004.unknown

_1221332465.unknown

_1220975868.unknown

_1221330190.unknown

_1220729918.unknown

_1220730006.unknown

_1220905979.unknown

_1220906266.unknown

_1220935615.unknown

_1220906349.unknown

_1220906026.unknown

_1220730133.unknown

_1220730410.unknown

_1220905978.unknown

_1220730022.unknown

_1220729956.unknown

_1220729927.unknown

_1190573410.unknown

_1190580640.unknown

_1190581212.unknown

_1220729903.unknown

_1190581289.unknown

_1190580775.unknown

_1190581002.unknown

_1190580556.unknown

_1190580601.unknown

_1190580450.unknown

_1190580422.unknown

_1190580428.unknown

_1190580280.unknown

_1190580202.unknown

_1190580262.unknown

_1190573532.unknown

_1190574688.unknown

_1190572386.unknown

_1190572467.unknown

_1190572360.unknown

_1189921255.unknown

_1189921585.unknown

_1189922120.unknown

_1189922743.unknown

_1189923675.unknown

_1190572151.unknown

_1190572152.unknown

_1190095716.unknown

_1190571951.unknown

_1189925642.unknown

_1189923116.unknown

_1189923407.unknown

_1189922778.unknown

_1189922343.unknown

_1189922424.unknown

_1189922182.unknown

_1189921719.unknown

_1189922087.unknown

_1189921622.unknown

_1189921283.unknown

_1189921538.unknown

_1189918894.unknown

_1189919669.unknown

_1189919704.unknown

_1189920142.unknown

_1189919618.unknown

_1189918768.unknown

_1189918869.unknown

_1189917740.unknown

_1189916345.unknown

_1189916808.unknown

_1189917054.unknown

_1189917155.unknown

_1189916986.unknown

_1189916527.unknown

_1189916588.unknown

_1189916420.unknown

_1189915849.unknown

_1189916097.unknown

_1189916210.unknown

_1189916055.unknown

_1189915462.unknown

_1189915643.unknown

_1189915367.unknown

_1189490082.unknown

_1189490297.unknown

_1189490681.unknown

_1189490915.unknown

_1189491096.unknown

_1189491552.unknown

_1189494032.unknown

_1189491371.unknown

_1189490930.unknown

_1189490889.unknown

_1189490612.unknown

_1189490548.unknown

_1189490589.unknown

_1189490165.unknown

_1189490215.unknown

_1189490263.unknown

_1189490146.unknown

_1189486825.unknown

_1189489958.unknown

_1189486860.unknown

_1189489829.unknown

_1189485720.unknown

_1189486543.unknown

_1189485544.unknown

_1189485644.unknown

_1189485578.unknown

_1189318345.unknown

_1189485534.unknown

_1189318679.unknown

_1189317899.unknown

_1189283251.unknown