14
CHAPTER 1 INTRODUCTION "...it is not enough to know that a sample could have come from a normal population; we must be clear that it is at the same time improbable that it has come from a population differing so much from the normal as to invalidate the use of "normal theory" tests in further handling of the material." E.S. Pearson, 1930 1.1 Why Test for Normality? The topic of this text is the problem of testing whether a sample of obser- vations comes from a normal distribution. Normality is one of the most common assumptions made in the development and use of statistical proce- dures. The problem has not suffered from lack of attention. In our review of the literature we found more tests than we ever imagined existed. This text, for instance, considers about forty formal testing procedures that have been proposed to test specifically for normality, as well as plotting methods, outlier tests, general goodness of fit tests and other tests that are useful in detecting non-normality in specialized situations. Further, the list is probably not exhaustive. For example, while the sample moment skewness and kurtosis statistics are commonly used as tests of normality, many such moment tests could be considered (Chapter 3). Geary (1947) Copyright © 2002 by Marcel Dekker, Inc. All Rights Reserved.

2 Introduction

Embed Size (px)

Citation preview

  • CHAPTER 1

    INTRODUCTION

    "...it is not enough to know that a sample could have come from anormal population; we must be clear that it is at the same time improbablethat it has come from a population differing so much from the normal asto invalidate the use of "normal theory" tests in further handling of thematerial."

    E.S. Pearson, 1930

    1.1 Why Test for Normality?

    The topic of this text is the problem of testing whether a sample of obser-vations comes from a normal distribution. Normality is one of the mostcommon assumptions made in the development and use of statistical proce-dures. The problem has not suffered from lack of attention. In our reviewof the literature we found more tests than we ever imagined existed. Thistext, for instance, considers about forty formal testing procedures thathave been proposed to test specifically for normality, as well as plottingmethods, outlier tests, general goodness of fit tests and other tests that areuseful in detecting non-normality in specialized situations. Further, thelist is probably not exhaustive. For example, while the sample momentskewness and kurtosis statistics are commonly used as tests of normality,many such moment tests could be considered (Chapter 3). Geary (1947)

    Copyright 2002 by Marcel Dekker, Inc. All Rights Reserved.

  • considered the larger class of absolute moment tests and developed quitegeneral results concerning their power. Thode (1985) used Geary's calcu-lations to further refine the evaluation of these absolute moment tests andfound that some of these tests had modestly better power under certaincircumstances than the kurtosis test or Geary's test.

    The objective of this text is to summarize the vast literature on tests ofnormality and to describe which of the many tests are effective and whichare not. Some results are surprising. Such popular tests as the Kolmogorov-Smirnov or chi-squared goodness of fit tests have power so low that theyshould not be seriously considered for testing normality (D'Agostino, Be-langer and D'Agostino, 1990). In general, the performance of moment testsand the Wilk-Shapiro test is so impressive that we recommend their use ineveryday practice.

    1.1.1 Historical Review of Research on Whether Assumption ofNormality Is Valid or Important

    Statistical procedures such as t-tests, tests for regression coefficients, anal-ysis of variance, and the F-test of homogeneity of variance have as anunderlying assumption that the sampled data come from a normal distri-bution. Of course, the assumption of normality in a statistical procedurerequires an effective test of whether the assumption holds, or a careful ar-gument showing that the violation of the assumption does not invalidatethe procedure used. Much statistical research has been concerned withevaluating the magnitude of the effect of violations of this assumption onthe true significance level of a test or the efficiency of parameter estimates.

    Research evaluating the effects of violations of the assumption of nor-mality upon standard statistical procedures date back before Bartlett's1935 paper on the t-test. Fisher (1930) thought the problem of such im-portance that he developed results on the cumulants of the skewness andkurtosis statistics as tests of normality.

    This problem has come to have the generic label of "robustness" in thecurrent statistical literature. A review of the literature of robustness andtests of normality shows many contributions from the most outstandingtheorists and practitioners of statistics. Pitman (1937a, 1937b) consideredthe problem of the sensitivity of the t-test and the one way analysis ofvariance from the point of view of using permutation theory to establishimportant results on the lack of sensitivity of the t-test and one way analysisof variance to violations of normality.

    Geary (1947) used the Gram-Charlier system of distributions as thenon-normal alternatives being sampled in order to determine the validity

    Copyright 2002 by Marcel Dekker, Inc. All Rights Reserved.

  • C Oo

    La: o

    3.0 4.0 5.0Kurtosis Value

    Figure 1.1 Actual significance level of z-test when underlying distributionhas specified kurtosis value (from Geary, 1947).

    of statistical procedures. The advantage of these distributions was thatGeary could calculate the asymptotic moments of the absolute momenttests and from these the approximate distribution of the tests and hencecould determine their power as tests for normality. He showed that for thetwo sample z-test and for testing the homogeneity of variance the effects ofhaving a symmetric non-normal underlying distribution can seriously affectthe true significance level of the test. For a value of 1.5 for the kurtosis ofthe alternative distribution, the actual significance level of the test is lessthan 0.0001, as compared to the level of 0.05 if the distribution sampledwere normal. For a distribution with a kurtosis value of 6, the probabilityof rejection was 0.215. Figure 1.1 presents a graph of Geary's results forother values of the kurtosis under his chosen set of alternatives.

    In contrast, for the t-test Geary determined that the distortion of theprobability of rejection is slight if the underlying distribution is symmetricbut non-normal, confirming Pitman's more general permutation distribu-tion analysis. If the underlying distribution is skewed, marked changes tothe probability of rejection can occur. For the two sample t-test Geary con-cluded that if the underlying distribution is the same for both populations,regardless of the type of non-normality, the changes in the probability thatthe null hypothesis is rejected are small. Large changes can occur if thedistributions are different.

    Box (1953) showed that Bartlett's test of homogeneity of variance wasseriously compromised by non-normality and that it was foolish to useBartlett's test as a pretest to determine whether the analysis of variance

    Copyright 2002 by Marcel Dekker, Inc. All Rights Reserved.

  • might be invalid due to different variances. He summed up the effects ofignoring non-normality when comparing two samples rather well:

    "So far as comparative test on means are concerned ... this prac-tice [of ignoring non-normality] is largely justifiable ... There isabundant evidence that these comparative tests on means are re-markably insensitive to general non-normality of the parent pop-ulation."

    By "general non-normality" he meant "that the departure from nor-mality, in particular skewness, is the same in the different groups ...". Onthe other hand, robustness to non-normality "... is not necessarily sharedby other statistical tests, and in particular is not shared by the tests forequality of variances ...".

    Simulations by Pearson and Please (1975) confirmed the theoreticalresults on the lack of robustness of the F-test for homogeneity of variancefor small samples. Using samples of size 10 and 25, they showed that non-normality seriously affects the true significance level of the single samplevariance test and the two sample variance ratio, even for symmetric par-ent distributions. The one sample t-test is affected by skewness; the twosample t-test is riot affected greatly when the samples come from identicalpopulations.

    Subrahmaniam, Subrahmaniam and Messeri (1975) looked at the ef-fects on the true level of the test for one sample t-tests, analysis of varianceand analysis of variance for regression when the underlying distribution is alocation contaminated normal distribution. Their calculations were basedon samples of size 20 or less. There was little effect on any of the proce-dures when the contamination was small. For larger contaminations andlarger differences in component means, the effect on the probability of the0.05-level t-test became quite large (probability of 0.16 for alternative withcontamination fraction 0.25, standardized difference in means of 1, andsample size 20). This agrees with Geary's conclusion concerning the t-test.

    Tukey (1960) started the more recent flood of consideration about theproblem of robustness in estimation against slight departures from normal-ity. He showed the effects of non-normality on the estimation of locationand scale parameters of a distribution using an unbalanced mixture of twonormals with common mean and different variances. He called such a mix-ture a scale contaminated normal and concluded that if "... contaminationis a real possibility ... neither mean nor variance is likely to be a wiselychosen basis for making estimates from a large sample". Tukey's work,in fact, was the starting point for our work on the problem of testing fornormality: how large were the sample sizes needed to detect the contami-nations that the researchers in robustness were examining (Thode, Smith

    Copyright 2002 by Marcel Dekker, Inc. All Rights Reserved.

  • and Finch, 1983).D'Agostino and Lee (1977) compared the efficiency of several estimates

    of location, including the sample mean, when the underlying distributionwas either a Student's t or exponential power distribution. Both of theseare symmetric families. The efficiency of the estimates was compared basedon the kurtosis value of the underlying distribution. For the t distribution,the relative efficiency of the sample mean (which is 1 when the distri-bution is normal) only decreases to about 0.9 for a kurtosis value of 6(corresponding to a t distribution with 6 degrees of freedom). For the ex-ponential power distribution, however, the relative efficiency of the samplemean drops quickly and decreases to about 0.5 when the kurtosis value is6 (the Laplace distribution).

    1.1.2 Genetic Applications and Tests of Clustering

    Another problem of interest to us in testing for normality arose from theproblem of how one would test for a different type of mixture, the location-contaminated normal distribution (Thode, Finch and Mendell, 1988). Ap-plications of this type are important in genetics, where the alternative oftenconsists of a mixture of normal components with differences in the loca-tion parameters. The issue is that one can make an inference from thenumber of components in the mixture to aspects of the structure of thegene determining the variable being measured. For example, a mixture oftwo normal components suggests a simpler type of genetic model than amixture of three normal components. This can be generalized to testing incluster analysis, where each cluster is made up of observations from a nor-mal distribution. In principle, then, a test of normality might have somepromise as a tool in cluster analysis.

    Thus a problem that can be focused on in this context is: when thereis a null hypothesis that observations come from a single normal distribu-tion, with an alternative that the observations come from several normalcomponents, a test for normality can be used to support or reject the nullhypothesis. The alternative may consist of a known or unknown num-ber of components. The components may differ in either or both of theparameters of the normal distribution.

    1.1.3 Comparative Evaluation of General Goodness of Fit Teststo Normality Tests

    There are more tests designed specifically to assess normality than for anyother particular distribution. The literature contains many tests that take

    Copyright 2002 by Marcel Dekker, Inc. All Rights Reserved.

  • advantage of special properties of the normal distribution. For example, thegeneral absolute moment tests take advantage of specific relations amongthe moments of the normal distribution. The Wilk-Shapiro test comparesan estimate of the standard deviation using a linear combination of theorder statistics to the usual estimate.

    Intuitively, such statistics should be more sensitive to certain alterna-tives than the completely general goodness of fit tests such as the x2 test orthe Kolmogorov-Smirnov test. These procedures operate by using the cu-mulative distribution function to reduce the general problem to the specificone of testing the hypothesis of uniformity. In addition to the very com-mon chi-squarcd test and the Kolmogorov-Smirnov test, there are manyother general procedures. One of our objectives will be to consider suchtests to determine which are effective for the specific problem of testing fornormality. Such a comparison may well provide indications of the relativevalue of these general tests for testing other null hypotheses.

    Relatively little work has been done in the field of testing for rnulti-variate normality compared to that done for univariate normality. Smallimprovements in the ability to test for univariate normality may lead tolarger improvements in the ability to handle the multivariate problem. Wewill survey the field to date and will present our own evaluations.

    1.2 Hypothesis Testing for Distributional Assumptions

    Suppose you have a random sample of n independent and identically dis-tributed (iid) observations of a random variable X, labeled x.i,X2,... ,xn,from an unspecified density f ( x ) . The general goodness of fit problemconsists of testing the null hypothesis

    #o : f ( x ) = f o ( x )against an alternative hypothesis. The probability density function (pdf)in the null hypothesis fo(x} has a specified distributional form. When theparameters are completely specified, the null hypothesis is called a simplehypothesis. If one or more of the parameters in HQ are not specified, HQ iscalled a composite hypothesis.

    Depending upon the problem, the alternative may be completely spec-ified (in the case of a simple null hypothesis) including the values of theparameters,

    #1: /(*) = Afofl).For composite null hypotheses it may consist of a class of distributions

    Copyright 2002 by Marcel Dekker, Inc. All Rights Reserved.

  • (i.e., parameters not specified), or it may be completely general

    Another general goodness of fit problem arises when the alternative is anydistribution with a specified shape. Historically, non-normal alternativesare divided into three shape classes based on the comparison of their thirdand fourth standardized moments (denoted ^f^\ and /?2, respectively) tothose of the normal distribution. A distribution whose standardized thirdmoment (skewness) is different from 0 is necessarily skewed. The value ofthe standardized fourth moment (kurtosis) for a normal distribution is 3,although a value of 3 does not necessarily indicate a normal distribution(e.g., Johnson, Tietjen and Beckman, 1980; Balanda and MacGillivray,1988). Symmetric alternatives are often separated into those with popula-tion kurtosis less than or greater than 3.

    In this text we consider tests of the composite hypothesis of normality

    . , -oo < x < oo.V27TCT

    where both the mean (//) and standard deviation (cr) are unknown. This isgenerally the case of interest in practice. In the past many general goodnessof fit tests (specifically those based on the empirical distribution function[EDF tests]) required complete specification of the null parameters, to theirdisadvantage. Stephens (1974) and others improved the use of these testsby developing EDF tests for composite hypotheses. We examine tests de-rived for specific (except for parameters) alternatives, shape alternativesand the general alternative.

    Some tests, such as likelihood ratio tests and most powerful locationand scale invariant tests (Chapter 4), were derived for detecting a specificalternative to normality. These are based on the joint probabilities of thenull and alternative distributions, given the values of the observations.The disadvantages of these tests are that many of them are not able tobe calculated in closed form, critical values are rarely available, and theymay not be efficient as tests of normality if in fact neither the null northe specified alternative hypotheses are correct. One might consider rathera more general test which is useful in detecting an alternative of similarshape to the specified alternative. On the other hand, some likelihood ratiotests are useful in testing for a broader set of alternatives.

    Shape tests are divided into two classes, tests for skewed alternativesand tests for non- normal symmetric alternatives. Most shape tests canfurther be broken down into directional and bidirectional tests. Directionaltests for skewness are used when a left or right skewed distribution is known

    Copyright 2002 by Marcel Dekker, Inc. All Rights Reserved.

  • to be the alternative of concern. A bidirectional skewness test is usedwhen a skewed alternative is of concern but the direction of skewness isnot known.

    For symmetric alternatives, directional tests are used when it is knownthat the alternative is assumed to be heavy- or light-tailed. A bidirectionaltest is used when it is assumed only that the alternative is symmetric andnon-normal.

    Omnibus tests are designed to cover all possible alternatives. They arenot usually as powerful as specific or (directional or bidirectional) shapetests when the characteristics of the true alternative are correctly identified.These are usually single-tailed tests, e.g., the Wilk-Shapiro W test (Shapiroand Wilk, 1965) and the probability plot correlation test (Filliben, 1975).Combinations of directional tests have also been suggested as omnibus tests(e.g., D'Agostino and Pearson, 1973).

    The tests which we will describe are also location and scale invariant,i.e., they have the property that a change in the location or scale of theobservations do not affect the test statistic or the resulting test, i.e.,

    T(XI, . x 2 , . . . , x n } = T(kxi - u, kx2 - u , . . . , kxn - u)for constants k and u. This is a desirable property of a test since theparameters do not affect the shape of the normal distribution. For moststatistical procedures, distribution assumptions which are made usuallyonly concern shape.

    Tests for normality that will be discussed come under one of threepossibilities: for a specified HQ and H\, a test statistic t, a specified sig-nificance level a, and a constant k chosen appropriately, normality may berejected if

    (1) t < ki,aThis type of test is common for regression tests and some directional tests(e.g., tests for alternatives skewed to the left).

    (2) t > ku,aLikelihood ratio tests, tests for outliers, and some directional tests areusually of this form.

    (3) t < kittx/2 or t > kUja/2A two-tailed test is most often used for alternatives where the charac-teristics of the shape are specified but not the direction (e.g., symmetricalternative but it is not known if it has long or short tails).

    In most cases the tests discussed in the text can be modified to dealwith specified parameters by substituting the hypothesized values for the

    Copyright 2002 by Marcel Dekker, Inc. All Rights Reserved.

  • estimated values in the test statistic formula. However, adjustments tothe critical values for the test must also be made. These types of tablesare rarely available. One notable exception is Barnett and Lewis (1994)who provided tables of critical values for some outlier tests with 1, 2 or noparameters specified.

    1.3 Symmetric Distributions and the Meaning of Kurtosis

    A symmetric distribution is often called "heavy" or "long" tailed if thestandardized fourth moment is greater than the normal value of 3 andcalled "light" or "short" tailed if the value of fa is less than three. Alter-natively, these distributions are also sometimes called "peaked" or "flat",respectively, in relation to the normal distribution. The nomenclature forthese two symmetric shape classes is misleading. There has been muchdiscussion over whether fa describes peakedness, tail length/size, or the"shoulders" of a distribution. Dyson (1943) and Finucan (1964) showedthat, for two symmetric densities g(x] and f(x) with mean 0 and equalvariance, if

    f(x) < g(x) for a g(x] for | x \< a or | x |> bthen H^(f) > ^4(5); i - e -> if the value of the density f(x) is lower than thatof g(x) in some interval between the mean and the tails (the "shoulders" ofthe distribution) and higher elsewhere, then f(x) has the higher kurtosis.(For densities with 0 means and equal variance, ^4 is essentially kurtosis,see Chapter 3.)

    Darlington (1970) claimed that kurtosis is actually a measure of theprobability mass around the two values p,cr and therefore should be in-terpreted as a measure of birnodality around those two points. Followingup on Darlington's work, Moors (1986) described kurtosis as a measureof dispersion around the two values [i a, declaring the interpretation ofbirnodality to be false. Large values of fa occur with probability densitiesthat have less probability mass around these two values (the "shoulders").Since the mass must equal 1 the probability mass must be concentrated ei-ther near the mean n or in the tails. By assimilating the results of Dyson,Finucan and Moors, one sees that a distribution with kurtosis higher thannormal must have higher density values at /z (causing peakedness) and/orin the tails (causing heavy tails). Depending on how the mass is dispersedin the center and the tails of the distribution, different symmetric distri-butions may therefore have the same kurtosis value. Figure 1.2 shows the

    Copyright 2002 by Marcel Dekker, Inc. All Rights Reserved.

  • Figure 1.2 Comparison of the normal (fa = 3), "" Laplace (fa = 6) and^6 (fa = 6) densities.

    density functions of the Laplace distribution and the t distribution with6 degrees of freedom, both of which have kurtosis values of 6, comparedto the normal distribution with the same mean and variance. As can beseen, the Laplace is considerably more peaked than the t distribution; it isdifficult to see differences in the tails on this scale although both the t andLaplace distributions have heaver tails than the normal.

    The use of kurtosis, and of skewness for that matter, as a measure ofthe degree of non-normality in a distribution is somewhat arbitrary, as oth-ers have proposed alternative measures of shape. Groeneveld and Meeden(1984) define a variety of other measures of distribution skewness and kur-tosis; Crow and Siddiqui (1967), Uthoff (1968), Rogers and Tiikey (1972),Filliben (1975), Ruppert (1987) and Moors (1988), among others, defineother measures of skewness and departures from non-normal symmetrywhich are based on percentilcs. Kendall and Stuart (1977) attribute themeasure (x xm)/s as a measure of skewness in a sample to Karl Pearson,where of, xm and s are the sample mean, mode and standard deviation,respectively. Use of the median rather than the mode in this measure hasalso been suggested (Kendall and Stuart, 1977; Groeneveld and Meeden,1984).

    The definition of kurtosis and the presentation of alternatives has beenreviewed in some detail by Balanda and MacGillivray (1988), who also pro-vided a fairly comprehensive list of references on the meaning of kurtosisand some alternative measures to kurtosis. It is hoped that the readerwill realize that kurtosis is not the final word in symmetric distributions;

    Copyright 2002 by Marcel Dekker, Inc. All Rights Reserved.

  • for example, as mentioned before D'Agostino and Lee (1977) found quitedifferent estimation efficiencies for the mean when sampled from two sym-metric non-normal distributions even though they had equal populationkurtosis values. Other results based purely on the kurtosis values of a sin-gle family of distributions such as those presented by Geary (1947) shouldalso not be taken as universal.

    However, at the risk of perpetuating a myth, in this text we will usefa to define the shape classes of symmetric distributions, and to impart asense of the degree of the "non-normalness" of a symmetric distribution.We trust this will not cause much concern, since we will in general limit theuse of (3-2 as a division of symmetric distributions into two classes ratherthan as a quantitative value.

    1.4 Objectives of This Text

    Our main objective is to present, as completely as possible, a viable andvaluable list of tests and procedures that can be used for assessing nor-mality, including both univariate and multivariate normality. We comparetests in terms of power and ease of use.

    Chapters 2 through 7 describe procedures for assessing univariate nor-mality in complete samples. Chapter 2 describes probability plotting meth-ods and regression and correlation tests. Chapter 3 contains tests based onsample moments, and moment-type tests. In Chapter 4 we present othertests specifically derived for testing normality. Chapter 5 describes generalgoodness of fit tests and their usefulness in testing specifically for normal-ity. In Chapter 6 we present tests specifically designed to detect outliers.In Chapter 7 we summarize results of power studies and comparisons ofthe various univariate tests for normality presented in the preceding fivechapters. Chapter 8 also focuses on univariate samples, but considers thecase of censored data.

    Chapters 9 and 10 are concerned with assessing normality in multi-variate samples. Chapter 9 describes tests for multivariate normality, whileChapter 10 considers those tests designed to detect multivariate outliers.Chapter 11 focuses on a more specific problem, that of testing for mixturesof normal distributions, in both the univariate and multivariate cases.

    Chapter 12 presents basic methods for robust estimation, which can beused in the event that data are determined not to be normal. Chapter 13describes various computational issues in assessing normality. Appendicescontain data sets used in the examples presented throughout the text, andtables of parameter and critical values for many of the procedures described.

    Copyright 2002 by Marcel Dekker, Inc. All Rights Reserved.

  • References

    Balanda, K.P., and MacGillivray, H.L. (1988). Kurtosis: a critical review.American Statistician 42, 111-119.

    Barnett, V., and Lewis, T. (1994). Outliers in Statistical Data, 2nded. John Wiley and Sons, New York.

    Bartlett, M.S. (1935). The effect of non-normality on the t-distribution.Proceedings of the Cambridge Philosophical Society 31, 223-231.

    Box, G.E.P. (1953). Non-normality and tests on variances. Biometrika 40,318-335.

    Crow, E.L., and Siddiqui, M.M. (1967). Robust estimation of location.Journal of the American Statistical Association 62, 353-389.

    D'Agostino, R.B., Belanger, A., and D'Agostino, Jr., R.B. (1990). A sug-gestion for using powerful and informative tests of normality. AmericanStatistician 44, 316-321.

    D'Agostino, R.B., and Lee, A.F.S. (1977). Robustness of location esti-mators under changes of population kurtosis. Journal of the AmericanStatistical Association 72, 393-396.

    D'Agostino, R., and Pearson, E.S. (1973). Tests for departure from normal-ity. Empirical results for the distributions of fa and \ffi\. Biometrika62, 243-250.

    Darlington, R.B. (1970). Is kurtosis really 'peakedness'? American Statis-tician 24, 19-22.

    Dyson, F.J. (1943). A note on kurtosis. Journal of the Royal StatisticalSociety B 106, 360-361.

    Filliben, J.J. (1975). The probability plot correlation coefficient test fornormality. Technometrics 17, 111-117.

    Finucan, H.M. (1964). A note on kurtosis. Journal of the Royal StatisticalSociety B 26, 111-112.

    Fisher, R.A. (1930). The moments of the distribution for normal samples ofmeasures of departure from normality. Proceedings of the Royal Societyof London A 130, 16-28.

    Geary, R.C. (1947). Testing for normality. Biometrika 34, 209-242.Groenevcld, R.A., and Meeden, G. (1984). Measuring skewness and kur-

    tosis. The Statistician 33, 391-399.

    Copyright 2002 by Marcel Dekker, Inc. All Rights Reserved.

  • Johnson, M.E., Tietjen, G.L., and Beckman, R.J. (1980). A new familyof probability distributions with application to Monte Carlo studies.Journal of the American Statistical Association 75, 276-279.

    Kendall, M., and Stuart, A. (1977). The Advanced Theory of Statis-tics, Vol. I. MacMillan Publishing Co., New York.

    Moors, J.J.A. (1986). The meaning of kurtosis: Darlington reexamined.American Statistician 40, 283-284.

    Moors, J.J.A. (1988). A quantile alternative for kurtosis. The Statistician37, 25-32.

    Pearson, E.S. (1930). A further development of tests for normality. Bio-metrika 22, 239-249.

    Pearson, E.S., and Please, N.W. (1975). Relation between the shape ofpopulation distribution and the robustness of four simple test statistics.Biometrika 62, 223-241.

    Pitman, E.J.G. (1937a). Significance tests which may be applied to samplesfrom any population. Supplement to the Journal of the Royal StatisticalSociety 4, 119-130.

    Pitman, E.J.G. (1937b). Significance tests which may be applied to samplesfrom any populations: III. The analysis of variance test. Biometrika 29,322-335.

    Rogers, W.H., and Tukey, J.W. (1972). Understanding some long-tailedsymmetrical distributions. Statistica Neerlandica 26, 211-226.

    Ruppert, D. (1987). What is kurtosis? American Statistician 41, 1-5.Shapiro, S.S., and Wilk, M.B. (1965). An analysis of variance test for

    normality (complete samples). Biometrika 52, 591-611.Stephens, M.A. (1974). EDF statistics for goodness of fit and some com-

    parisons. Journal of the American Statistical Association 69, 730-737.

    Subrahmaniam, K., Subrahmaniam, K., and Messeri, J.Y. (1975). On therobustness of some tests of significance in sampling from a compoundnormal distribution. Journal of the American Statistical Association 70,435-438.

    Thode, Jr., H.C. (1985). Power of absolute moment tests against symmetricnon-normal alternatives. Ph.D. dissertation, University Microfilms, AnnArbor MI.

    Copyright 2002 by Marcel Dekker, Inc. All Rights Reserved.

  • Thode, Jr., H.C., Finch, S.J., and Mendell, N.R. (1988). Simulated per-centage points for the null distribution of the likelihood ratio test for amixture of two normals. Biometrics 44, 1195-1201.

    Thode, Jr., H.C., Smith, L.A., and Finch, S.J. (1983). Power of tests ofnormality for detecting scale contaminated normal samples. Communi-cations in Statistics - Simulation and Computation 12, 675-695.

    Tukey, J.W. (1960). A survey of sampling from contaminated distributions.In I. Olkin, S.G. Ghurye, W. Hoeffding, W.G. Madow, and H.B. Mann,eds., Contributions to Probability and Statistics, Stanford Univ.Press, CA, 448-485.

    Uthoff, V.A. (1968). Some scale and origin invariant tests for distributionalassumptions. Ph.D. dissertation, University Microfilms, Ann Arbor MI.

    Copyright 2002 by Marcel Dekker, Inc. All Rights Reserved.

    TESTING FOR NORMALITYCONTENTSChapter 1 intoduction1.1 Why Test for Normality?1.1.1 Historical Review of Research on Whether Assumption of Normality Is Valid or Important1.1.2 Genetic Applications and Tests of Clustering1.1.3 Comparative Evaluation of General Goodness of Fit Tests to Normality Tests

    1.2 Hypothesis Testing for Distributional Assumptions1.3 Symmetric Distributions and the Meaning of Kurtosis1.4 Objectives of This TextReferences