2.4pdf

2.4 GoodnessofFit Test

Printerfriendly version (https://onlinecourses.science.psu.edu/stat504/print/book/export/html/60)

A goodnessoffit test, in general, refers to measuring how well do the observed data correspond tothe fitted (assumed) model. We will use this concept throughout the course as a way of checking themodel fit. Like in a linear regression, in essence, the goodnessoffit test compares the observedvalues to the expected (fitted or predicted) values.

A goodnessoffit statistic tests the following hypothesis:

H0: the model M0 fits vs.

HA: the model M0 does not fit (or, some other model MA fits)

Most often the observed data represent the fit of the saturated model, the most complex modelpossible with the given data. Thus, most often the alternative hypothesis (HA) will represent thesaturated model MA which fits perfectly because each observation has a separate parameter. Later inthe course we will see that MA could be a model other than the saturated one. Let us now considerthe simplest example of the goodnessoffit test with categorical data.

In the setting for oneway tables, we measure how well an observed variable X corresponds to a Mult(n, π) model for some vector of cell probabilities, π. We will consider two cases:

1. when vector π is known, and2. when vector π is unknown.

In other words, we assume that under the null hypothesis data come from a Mult (n, π) distribution,and we test whether that model fits against the fit of the saturated model. The rationale behind anymodel fitting is the assumption that a complex mechanism of data generation may be represented bya simpler model. The goodnessoffit test is applied to corroborate our assumption.

Consider our Dice Example (/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson02/dice_example.png) fromthe Introduction. We want to test the hypothesis that there is an equal probability of six sides; that iscompare the observed frequencies to the assumed model: X ∼ Multi (n = 30, π0 = (1/6, 1/6, 1/6, 1/6,1/6, 1/6)). You can think of this as simultaneously testing that the probability in each cell is beingequal or not to a specified value, e.g.

H0: ( π1, π2, π3, π4, π5, π6) = (1/6, 1/6, 1/6, 1/6, 1/6, 1/6)

STAT 504Analysis of Discrete Data

https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson02/dice_example.png

http://science.psu.edu/

https://onlinecourses.science.psu.edu/stat504/

https://onlinecourses.science.psu.edu/stat504/print/book/export/html/60

vs.

HA: ( π1, π2, π3, π4, π5, π6) ≠ (1/6, 1/6, 1/6, 1/6, 1/6, 1/6).

Most software packages will already have builtin functions that will do this for you; see the nextsection for examples (https://onlinecourses.science.psu.edu/stat504/node/61) in SAS and R. Here is a stepby stepprocedure to help you conceptually understand this test better and what is going on behind thesefunctions.

Step 1: If vector π is unknown we need to estimate these unknown parameters, and proceedto Step 2; If vector π is known proceed to Step 2.

Step 2: Calculate the estimated (fitted) cell probabilities s, and expected cell frequencies,Ej's under H0.

Step 3: Calculate the Pearson goodnessoffit statistic, X 2 and/or the deviance statistic, G2

and compare them to appropriate chisquared distributions to make a decision.

Step 4: If the decision is borderline or if the null hypothesis is rejected, further investigatewhich observations may be influential by looking, for example, at residuals (../node/62) .

Pearson and deviance test statisticsThe Pearson goodnessoffit statistic is

An easy way to remember it is

where Oj = Xj is the observed count in cell j, and is the expected count in cell junder the assumption that null hypothesis is true, i.e. the assumed model is a good one. Notice that is the estimated (fitted) cell proportion πj under H0.

The deviance statistic is

where "log" means natural logarithm. An easy way to remember it is

In some texts, G2 is also called the likelihoodratio test statistic, for comparing the likelihoods(http://onlinecourses.science.psu.edu/stat504/node/27) (l0 and l1 ) of two models, that is comparing the

πj

=X2 ∑j=1

k ( − nXj πj)2

nπj

=X2 ∑j

( −Oj Ej)2

Ej

= E( ) = nEj Xj πj

πj

= 2 log ( )G2 ∑j=1

k

Xj

Xj

nπj

= 2 log ( )G2 ∑j

Oj

Oj

Ej

https://onlinecourses.science.psu.edu/stat504/node/62


http://onlinecourses.science.psu.edu/stat504/node/27

loglikelihoods under H0 (i.e., loglikelihood of the fitted model, L0) and loglikelihood under HA (i.e.,

loglikelihood of the larger, less restricted, or saturated model L1): G2 = 2log(l0/l1) = 2(L0 L1). A

common mistake in calculating G2 is to leave out the factor of 2 at the front.

Note that X2 and G2 are both functions of the observed data X and a vector of probabilities π. For thisreason, we will sometimes write them as X2(x, π) and G2(x, π), respectively; when there is noambiguity, however, we will simply use X2 and G2. We will be dealing with these statisticsthroughout the course; in the analysis of 2way and kway tables, and when assessing the fit of loglinear and logistic regression models.

Testing the GoodnessofFitX2 and G2 both measure how closely the model, in this case Mult (n, π) "fits" the observed data.

If the sample proportions pj = Xj /n (i.e., saturated model) are exactly equal to the model's πj for

cells j = 1, 2, . . . , k, then Oj = Ej for all j, and both X2 and G2 will be zero. That is, the model fits

perfectly.

If the sample proportions pj deviate from the 's computed under H0, then X2 and G2 are both

positive. Large values of X2 and G2 mean that the data do not agree well with theassumed/proposed model M0.

How can we judge the sizes of X2 and G2?

The answer is provided by this result:

If x is a realization of X ∼ Mult(n, π), then as n becomes large, the sampling distributions of bothX2(x, π) and G2(x, π) approach chisquared distribution (http://onlinecourses.science.psu.edu/stat504/node/23#chisquared) with df = k 1, where k = number of cells, χ2k−1.

This means that we can easily test a null hypothesis H0 : π = π0 against the alternative H1 : π ≠ π0 forsome prespecified vector π0. An approximate αlevel test of H0 versus H1 is:

Reject H0 if computed X2(x, π0) or G

2(x, π0) exceeds the theoretical value χ2 k−1(1 − α).

Here, χ2k−1(1 − α) denotes the (1 − α)th quantile of the χ2k−1 distribution, the value for which the

probability that a χ2k−1 random variable is less than or equal to it is 1 − α. The pvalue for this test is

the area to the right of the computed X2 or G2 under the χ2k−1 density curve. Below is a simple visualexample. Consider a chisquared distribution with df=10. Let's assume that a computed test statistic isX2=21. For α=0.05, the theoretical value is 18.31.

π

http://onlinecourses.science.psu.edu/stat504/node/23#chi-squared

(/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson02/chisqdistributions.R)

Useful functions in SAS and R to remember for computing the pvalues from the chisquaredistribution are:

In R, pvalue=1pchisq(test statistic, df), e.g., 1pchisq(21,10)=0.021In SAS, pvalue=1probchi(test statistic,df), e.g.,1-probchi(21,10)=0.021

You can quickly review the chisquared distribution in Lesson 0(https://onlinecourses.science.psu.edu/stat504/node/23) , or check outhttp://www.statsoft.com/textbook/stathome.html (http://www.statsoft.com/textbook/stathome.html) andhttp://www.ruf.rice.edu/~lane/stat_sim/chisq_theor/index.html(http://www.ruf.rice.edu/%7Elane/stat_sim/chisq_theor/index.html) . The STATSOFT link also has brief reviews ofmany other statistical concepts and methods.

Here are a few more comments on this test.

When n is large and the model is true, X2 and G2 tend to be approximately equal. For largesamples, the results of the X2 and G2 tests will be essentially the same.

An oldfashioned rule of thumb is that the χ2 approximation for X2 and G2 works well providedthat n is large enough to have Ej = nπj ≥ 5 for every j. Nowadays, most agree that we can haveEj< 5 for some of the cells (say, 20% of them). Some of the Ej's can be as small as 2, but none of

them should fall below 1. If this happens, then the χ2 approximation isn't appropriate, and thetest results are not reliable.

In practice, it's a good idea to compute both X2 and G2 to see if they lead to similar results. If theresulting pvalues are close, then we can be fairly confident that the largesample approximation

http://www.ruf.rice.edu/~lane/stat_sim/chisq_theor/index.html

https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson02/chisqdistributions.R


http://www.statsoft.com/textbook/stathome.html

‹ 2.3.3 Multinomial Sampling(/stat504/node/59)

up(/stat504/lesson2)

2.5 Examples in SAS/R: Dice Rolls &Tomato › (/stat504/node/61)

is working well.

If it is apparent that one or more of the Ej's are too small, we can sometimes get around theproblem by collapsing or combining cells until all the Ej's are large enough. But we can alsoperform a smallsample inference or exact inference. We will see more on this in Lesson 3(https://onlinecourses.science.psu.edu/stat504/node/89) . Please note that the smallsample inference can beconservative for discrete distributions, that is may give a larger pvalue than it really is (e.g., formore details see Agresti (2007), Sec. 1.4.31.4.5, and 2.6; Agresti (2013), Sec. 3.5, and forBayesian inference Sec 3.6.)

In most applications, we will reject the null hypothesis X ∼ Mult (n, π) for large values of X2 orG2. On rare occasions, however, we may want to reject the null hypothesis for unusually smallvalues of X2 or G2. That is, we may want to define the pvalue as P(χ2 k−1 ≤ X

2) or P(χ2 k−1 ≤

G2). Very small values of X2 or G2 suggest that the model fits the data too well, i.e. the data mayhave been fabricated or altered in some way to fit the model closely. This is how R.A. Fisherfigured out that some of Mendel's experimental data must have been fraudulent (e.g., seeAgresti (2007), page 327; Agresti (2013), page 19).

https://onlinecourses.science.psu.edu/stat504/lesson2




Documents

2.4pdf