12
TESTING FOR NORMALITY HENRY C. THODE, JR. State University of New York at Stony Brook Stony Brook, New York MARCEL u D E K K E R MARCEL DEKKER, INC. NEW YORK BASEL Copyright © 2002 by Marcel Dekker, Inc. All Rights Reserved.

1Front Page

Embed Size (px)

Citation preview

TESTING FORNORMALITY

HENRY C. THODE, JR.State University of New York

at Stony BrookStony Brook, New York

M A R C E LuD E K K E R

MARCEL DEKKER, INC. NEW YORK • BASEL

Copyright © 2002 by Marcel Dekker, Inc. All Rights Reserved.

ISBN: 0-8247-9613-6

This book is printed on acid-free paper.

HeadquartersMarcel Dekker, Inc.270 Madison Avenue, New York, NY 10016tel: 212-696-9000; fax: 212-685-4540

Eastern Hemisphere DistributionMarcel Dekker AGHutgasse 4, Postfach 812, CH-4001 Basel, Switzerlandtel: 41-61-261-8482; fax: 41-61-261-8896

World Wide Webhttp ://www. dekker. com

The publisher offers discounts on this book when ordered in bulk quantities. For more infor-mation, write to Special Sales/Professional Marketing at the headquarters address above.

Copyright © 2002 by Marcel Dekker, Inc. All Rights Reserved.

Neither this book nor any part may be reproduced or transmitted in any form or by anymeans, electronic or mechanical, including photocopying, microfilming, and recording, orby any information storage and retrieval system, without permission in writing from thepublisher.

Current printing (last digit):1 0 9 8 7 6 5 4 3 2 1

PRINTED IN THE UNITED STATES OF AMERICA

Copyright © 2002 by Marcel Dekker, Inc. All Rights Reserved.

STATISTICS: Textbooks and Monographs

D. B. Owen, Founding Editor, 1972-1991

1. The Generalized Jackknife Statistic, H. L. Gray and W. R. Schucany2. Multivariate Analysis, Anant M. Kshirsagar3. Statistics and Society, Walter T. Federer4. Multivariate Analysis: A Selected and Abstracted Bibliography, 1957-1972, Kocher-

lakota Subrahmaniam and Kathleen Subrahmaniam5. Design of Experiments: A Realistic Approach, Virgil L Anderson and Robert A.

McLean6. Statistical and Mathematical Aspects of Pollution Problems, John W. Pratt7. Introduction to Probability and Statistics (in two parts), Part I: Probability; Part II:

Statistics, Narayan C. Gin8. Statistical Theory of the Analysis of Experimental Designs, J. Ogawa9. Statistical Techniques in Simulation (in two parts), Jack P. C. Kleijnen

10. Data Quality Control and Editing, Joseph I. Naus11. Cost of Living Index Numbers: Practice, Precision, and Theory, Kali S. Banerjee12. Weighing Designs: For Chemistry, Medicine, Economics, Operations Research,

Statistics, Kali S. Banerjee13. The Search for Oil: Some Statistical Methods and Techniques, edited by D. B. Owen14. Sample Size Choice: Charts for Experiments with Linear Models, Robert E. Odeh and

Martin Fox15. Statistical Methods for Engineers and Scientists, Robert M. Bethea, Benjamin S.

Duran, and Thomas L Boullion16. Statistical Quality Control Methods, Irving W. Burr17. On the History of Statistics and Probability, edited by D. B. Owen18. Econometrics, Peter Schmidt19. Sufficient Statistics: Selected Contributions, VasantS. Huzurbazar (edited by Anant M.

Kshirsagar)20. Handbook of Statistical Distributions, Jagdish K. Pate/, C. H. Kapadia, and D. B. Owen21. Case Studies in Sample Design, A. C. Rosander22. Pocket Book of Statistical Tables, compiled by R. E. Odeh, D. B. Owen, Z. W.

Birnbaum, and L, Fisher23. The Information in Contingency Tables, D, V. Gokhale and Solomon Kullback24. Statistical Analysis of Reliability and Life-Testing Models: Theory and Methods, Lee J.

Bain25. Elementary Statistical Quality Control, Irving W. Bun26. An Introduction to Probability and Statistics Using BASIC, Richard A. Groeneveld27. Basic Applied Statistics, B. L. Raktoe and J. J. Hubert28. A Primer in Probability, Kathleen Subrahmaniam29. Random Processes: A First Look, R. Syski30. Regression Methods: A Tool for Data Analysis, Rudolf J. Freund and Paul D. Minton31. Randomization Tests, Eugene S. Edgington32. Tables for Normal Tolerance Limits, Sampling Plans and Screening, Robert E. Odeh

and D. B. Owen33. Statistical Computing, William J. Kennedy, Jr., and James E. Gentle34. Regression Analysis and Its Application: A Data-Oriented Approach, Richard F. Gunst

and Robert L Mason35. Scientific Strategies to Save Your Life, /. D. J. Brass36. Statistics in the Pharmaceutical Industry, edited by C. Ralph Buncher and Jia-Yeong

Tsay37. Sampling from a Finite Population, J. Hajek

Copyright © 2002 by Marcel Dekker, Inc. All Rights Reserved.

38. Statistical Modeling Techniques, S. S. and A. J.39. Statistical Theory and Inference in Research, T. A. Bancroft and C.-P. Han40. Handbook of the Normal Distribution, Jagdish K. Pate/ and Campbell B. Read41. Recent Advances in Regression Methods, Hrishikesh D. Vinod andAman Ullah42. Acceptance Sampling in Quality Control, Edward G. Schilling43. The Randomized Clinical Trial and Therapeutic Decisions, edited by Niels Tygstrup,

John M Lachin, and Erik Juhl44. Regression Analysis of Survival Data in Cancer Chemotherapy, Walter H. Carter, Jr.,

Galen L Wampler, and Donald M. Stablein45. A Course in Linear Models, Anant M. Kshirsagar46. Clinical Trials: Issues and Approaches, edited by Stanley H. Shapiro and Thomas H.

Louis47. Statistical Analysis of DMA Sequence Data, edited by B. S. Weir48. Nonlinear Regression Modeling: A Unified Practical Approach, David A. Ratkowsky49. Attribute Sampling Plans, Tables of Tests and Confidence Limits for Proportions, Rob-

ert E. Odeh and D. B. Owen50. Experimental Design, Statistical Models, and Genetic Statistics, edited by Klaus

Hinkelmann51. Statistical Methods for Cancer Studies, edited by Richard G. Cornell52. Practical Statistical Sampling for Auditors, Arthur J. Wilbum53. Statistical Methods for Cancer Studies, edited by Edward J. Wegman and James G.

Smith54. Self-Organizing Methods in Modeling: GMDH Type Algorithms, edited by Stanley J.

Farlow55. Applied Factorial and Fractional Designs, Robert A. McLean and Virgil L. Anderson56. Design of Experiments: Ranking and Selection, edited by Thomas J. Santner and Ajit

C. Tamhane57. Statistical Methods for Engineers and Scientists: Second Edition, Revised and Ex-

panded, Robert M. Bethea, Benjamin S. Duran, and Thomas L Bouillon58. Ensemble Modeling: Inference from Small-Scale Properties to Large-Scale Systems,

Alan E. Gelfand and Crayton C. Walker59. Computer Modeling for Business and Industry, Bruce L. Bowerman and Richard T.

O'Connell60. Bayesian Analysis of Linear Models, Lyle D. Broemeling61. Methodological Issues for Health Care Surveys, Brenda Cox and Steven Cohen62. Applied Regression Analysis and Experimental Design, Richard J. Brook and Gregory

C. Arnold63. Statpal: A Statistical Package for Microcomputers—PC-DOS Version for the IBM PC

and Compatibles, Bruce J. Chalmer and David G. Whitmore64. Statpal: A Statistical Package for Microcomputers—Apple Version for the II, II+, and

lie, David G. Whitmore and Bruce J. Chalmer65. Nonparametric Statistical Inference: Second Edition, Revised and Expanded, Jean

Dickinson Gibbons66. Design and Analysis of Experiments, Roger G. Petersen67. Statistical Methods for Pharmaceutical Research Planning, Sten W. Bergman and

John C. Gittins68. Goodness-of-Fit Techniques, edited by Ralph B. D'Agostino and Michael A. Stephens69. Statistical Methods in Discrimination Litigation, edited by D. H. Kaye and MikelAickin70. Truncated and Censored Samples from Normal Populations, Helmut Schneider71. Robust Inference, M. L. Tiku, W. Y. Tan, and N. Balakiishnan72. Statistical Image Processing and Graphics, edited by Edward J. Wegman and Douglas

J. DePriest73. Assignment Methods in Combinatorial Data Analysis, Lawrence J. Hubert74. Econometrics and Structural Change, Lyle D. Broemeling and Hiroki Tsurumi75. Multivariate Interpretation of Clinical Laboratory Data, Adelin Albert and Eugene K.

Harris

Copyright © 2002 by Marcel Dekker, Inc. All Rights Reserved.

76. Statistical Tools for Simulation Practitioners, P. C.77. Randomization Tests: Second Edition, Eugene S. Edgington78. A Folio of Distributions: A Collection of Theoretical Quantile-Quantile Plots, Edward B.

Fowlkes79. Applied Categorical Data Analysis, Daniel H. Freeman, Jr.80. Seemingly Unrelated Regression Equations Models: Estimation and Inference, Viren-

dra K. Srivastava and David E. A. Giles81. Response Surfaces: Designs and Analyses, Andre I. Khuri and John A. Cornell82. Nonlinear Parameter Estimation: An Integrated System in BASIC, John C. Nash and

Mary Walker-Smith83. Cancer Modeling, edited by James R. Thompson and Barry W. Brown84. Mixture Models: Inference and Applications to Clustering, Geoffrey J. McLachlan and

Kaye E. Basford85. Randomized Response: Theory and Techniques, Arijit Chaudhuri and Rahul Mukerjee86. Biopharmaceutical Statistics for Drug Development, edited by Karl E. Peace87. Parts per Million Values for Estimating Quality Levels, Robert E. Ode/7 and D. B. Owen88. Lognormal Distributions: Theory and Applications, edited by Edwin L Crow and Kunio

Shimizu89. Properties of Estimators for the Gamma Distribution, K. O. Bowman and L. R. Shenton90. Spline Smoothing and Nonparametric Regression, Randall L. Eubank91. Linear Least Squares Computations, R. W. Farebrother92. Exploring Statistics, Damaraju Raghavarao93. Applied Time Series Analysis for Business and Economic Forecasting, Sufi M. Nazem94. Bayesian Analysis of Time Series and Dynamic Models, edited by James C. Spall95. The Inverse Gaussian Distribution: Theory, Methodology, and Applications, Raj S.

Chhikara and J. Leroy Folks96. Parameter Estimation in Reliability and Life Span Models, A. Clifford Cohen and Betty

Jones Whitten97. Pooled Cross-Sectional and Time Series Data Analysis, Terry E. Dielman98. Random Processes: A First Look, Second Edition, Revised and Expanded, R. Syski99. Generalized Poisson Distributions: Properties and Applications, P. C. Consul

100. Nonlinear Lp-Norm Estimation, Rene Gonin and Arthur H. Money101. Model Discrimination for Nonlinear Regression Models, Dale S. Borowiak102. Applied Regression Analysis in Econometrics, Howard E. Doran103. Continued Fractions in Statistical Applications, K. O. Bowman and L. R. Shenton104. Statistical Methodology in the Pharmaceutical Sciences, Donald A. Berry105. Experimental Design in Biotechnology, Perry D. Haaland106. Statistical Issues in Drug Research and Development, edited by Karl E. Peace107. Handbook of Nonlinear Regression Models, David A. Ratkowsky108. Robust Regression: Analysis and Applications, edited by Kenneth D. Lawrence and

Jeffrey L. Arthur109. Statistical Design and Analysis of Industrial Experiments, edited by Subir Ghosh110. L/-Statistics: Theory and Practice, A. J. Lee111. A Primer in Probability: Second Edition, Revised and Expanded, Kathleen Subrah-

maniam112. Data Quality Control: Theory and Pragmatics, edited by Gunar E. Liepins and V. R. R.

Uppuluri113. Engineering Quality by Design: Interpreting the Taguchi Approach, Thomas B. Barker114. Survivorship Analysis for Clinical Studies, Eugene K. Harris and Adelin Albert115. Statistical Analysis of Reliability and Life-Testing Models: Second Edition, Lee J. Bain

and Max Engelhardt116. Stochastic Models of Carcinogenesis, Wai-Yuan Tan117. Statistics and Society: Data Collection and Interpretation, Second Edition, Revised and

Expanded, Walter T. Federer118. Handbook of Sequential Analysis, B. K. Ghosh and P. K. Sen119. Truncated and Censored Samples: Theory and Applications, A. Clifford Cohen

Copyright © 2002 by Marcel Dekker, Inc. All Rights Reserved.

120. Survey Sampling Principles, E. K.121. Applied Engineering Statistics, Robert M. Bethea and R. Russell Rhinehart122. Sample Size Choice: Charts for Experiments with Linear Models: Second Edition,

Robert E. Odeh and Martin Fox123. Handbook of the Logistic Distribution, edited by N. Balakrishnan124. Fundamentals of Biostatistical Inference, Chap T. Le125. Correspondence Analysis Handbook, J.-P. Benzecri126. Quadratic Forms in Random Variables: Theory and Applications, A. M. Mathai and

Serge B. Provost127. Confidence Intervals on Variance Components, Richard K. Burdick and Franklin A.

Graybill128. Biopharmaceutical Sequential Statistical Applications, edited by Karl E. Peace129. Item Response Theory: Parameter Estimation Techniques, Frank B. Baker130. Survey Sampling: Theory and Methods, Arijit Chaudhuri and Horst Stenger131. Nonparametric Statistical Inference: Third Edition, Revised and Expanded, Jean Dick-

inson Gibbons and Subhabrata Chakraborti132. Bivariate Discrete Distribution, Subrahmaniam Kocherlakota and Kathleen Kocher-

lakota133. Design and Analysis of Bioavailability and Bioequivalence Studies, Shein-Chung Chow

and Jen-pei Liu134. Multiple Comparisons, Selection, and Applications in Biometry, edited by Fred M.

Hoppe135. Cross-Over Experiments: Design, Analysis, and Application, David A. Ratkowsky,

Marc A. Evans, and J. Richard Alldredge136. Introduction to Probability and Statistics: Second Edition, Revised and Expanded,

Narayan C. Giri137. Applied Analysis of Variance in Behavioral Science, edited by Lynne K. Edwards138. Drug Safety Assessment in Clinical Trials, edited by Gene S. Gilbert139. Design of Experiments: A No-Name Approach, Thomas J. Lorenzen and Virgil L An-

derson140. Statistics in the Pharmaceutical Industry: Second Edition, Revised and Expanded,

edited by C. Ralph Buncher and Jia-Yeong Tsay141. Advanced Linear Models: Theory and Applications, Song-Gui Wang and Shein-Chung

Chow142. Multistage Selection and Ranking Procedures: Second-Order Asymptotics, Nitis Muk-

hopadhyay and Tumulesh K. S. Solanky143. Statistical Design and Analysis in Pharmaceutical Science: Validation, Process Con-

trols, and Stability, Shein-Chung Chow and Jen-pei Liu144. Statistical Methods for Engineers and Scientists: Third Edition, Revised and Expanded,

Robert M. Bethea, Benjamin S. Duran, and Thomas L. Bouillon145. Growth Curves, Anant M. Kshirsagar and William Boyce Smith146. Statistical Bases of Reference Values in Laboratory Medicine, Eugene K. Harris and

James C. Boyd147. Randomization Tests: Third Edition, Revised and Expanded, Eugene S. Edgington148. Practical Sampling Techniques: Second Edition, Revised and Expanded, Ranjan K.

Som149. Multivariate Statistical Analysis, Narayan C. Giri150. Handbook of the Normal Distribution: Second Edition, Revised and Expanded, Jagdish

K. Pate/ and Campbell B. Read151. Bayesian Biostatistics, edited by Donald A. Berry and Dalene K. Stangl152. Response Surfaces: Designs and Analyses, Second Edition, Revised and Expanded,

Andre I. Khuri and John A. Cornell153. Statistics of Quality, edited by Subir Ghosh, William R. Schucany, and William B. Smith154. Linear and Nonlinear Models for the Analysis of Repeated Measurements, Edward F.

Vonesh and Vemon M. Chinchilli155. Handbook of Applied Economic Statistics, Aman Ullah and David E. A. Giles

Copyright © 2002 by Marcel Dekker, Inc. All Rights Reserved.

156. Improving Efficiency by Shrinkage: The n and Ridge Regression Estima-tors, Marvin H. J. Gruber

157. Nonparametric Regression and Spline Smoothing: Second Edition, Randall L Eu-bank

158. Asymptotics, Nonparametrics, and Time Series, edited by Subir Ghosh159. Multivariate Analysis, Design of Experiments, and Survey Sampling, edited by Subir

Ghosh160. Statistical Process Monitoring and Control, edited by Sung H. Park and G. Geoffrey

Vining161. Statistics for the 21st Century: Methodologies for Applications of the Future, edited

by C. R. Rao and GaborJ. Szekely162. Probability and Statistical Inference, Nitis Mukhopadhyay163. Handbook of Stochastic Analysis and Applications, edited by D. Kannan and V. Lak-

shmikantham164. Testing for Normality, Henry C. Thode, Jr.

Additional Volumes in Preparation

Handbook of Applied Econometrics and Statistical Inference, edited by Aman Ullah,Alan T. K. Wan, and Anoop Chaturvedi

Visualizing Statistical Models and Concepts, R. W. Farebrother and Michael Schyns

Copyright © 2002 by Marcel Dekker, Inc. All Rights Reserved.

To my children,

Matthew, John, and Samantha

Copyright © 2002 by Marcel Dekker, Inc. All Rights Reserved.

PREFACE

In the development of statistical sampling theory it has often happenedthat more than one test of a given hypothesis is available. Generally ontheoretical grounds it is possible to specify which of these tests is the mostefficient; but it may happen that owing to mathematical difficulties in putting theideal test into working form or to practical difficulties arising from the extent ofcomputation involved, the statistician will choose to employ a second best butsimpler test.

E.S. Pearson, 1935

The Gaussian or normal distribution has long been the focal point of much ofstatistical study, for a number of reasons. Data often approximates a normal"bell-shaped" curve; the normal distribution is mathematically easy to workwith; and many statistics, both estimates and test statistics, as well as somedistributions become normal asymptotically.

For the purposes of this text, we are primarily concerned with the firstreason stated above. Normality (or lack thereof) of an underlying datadistribution can have an effect to a greater or lesser degree on the properties ofestimation or inferential procedures used in the analysis of the data. In order toaddress these issues, formal as well as informal methods have been developed inorder to ascertain the apparent normality of a data sample, so that theappropriateness of applying a statistical procedure to that sample can bedetermined. Fortunately (or unfortunately), a large number of methods fortesting for normality have been developed, providing researchers with a widerange of choices; however, this can also result in a corresponding good deal ofconfusion.

My objective in writing this text was to present, as completely aspossible, goodness of fit tests that were designed or could be used for

Copyright © 2002 by Marcel Dekker, Inc. All Rights Reserved.

determining whether a sample of observations could have come from a normaldistribution. My intent was to focus on methodology and utility of the testsrather than in-depth theoretical issues, so that readers would be able to make ajudgment as to which test(s) would be best for their particular circumstances,and could easily perform the test. I intended to make this text accessible toresearchers with a minimal amount of theoretical statistical background, and thesections in which I have delved into theory can generally be ignored withoutimpairing the practical application of the methods. To my knowledge, this textcontains the broadest and most comprehensive set of material published to dateon the single subject of testing for normality.

I also hoped that by presenting this material I would provide a betterunderstanding of underlying distributional assumptions and the effects ofviolating those assumptions. However, although more goodness of fitmethodology is focused on the normal than on any other distribution, not allunderlying assumptions are necessarily those of normality. Equipped with ascomplete as possible a description of tests for normality, readers concerned withgoodness of fit in regard to other null distributions may develop analogous testsbased on what is written here: perhaps tests can be extended or expanded toimprove tests for multivariate normality, the exponential distribution, or Weibulldistribution, for example.

Although some historical background and theory are provided for manyof the tests, the emphasis here is on the calculation and performance of the tests.I have omitted complete details on the formulation of some of the tests identifiedherein, mainly those less useful for practical purposes. These were mentionedpurely for the purpose of completeness. Although I limited this work to thenormal distribution, on occasion the applicability of certain tests to other nulldistributions has been mentioned. I believe the large bibliography will beinvaluable to anyone who feels the need to obtain more details in these areas.

This text comprises four sections. The first section (Chapter 1) isintroductory. The second (Chapters 2 through 8) addresses the issue of testingfor univariate normality in complete samples (Chapters 2-7) and censoredsamples (Chapter 8). The third section (Chapters 9 and 10) covers the topic oftesting for multivariate normality. The remainder of the text covers additionalmiscellaneous topics, including normal mixture distributions (univariate andmultivariate, Chapter 11), robust estimation (Chapter 12) and computationalissues (Chapter 13). Data sets used in the examples throughout the book areincluded in Appendix A and tables of critical values for most of the testspresented here are given in Appendix B.

Henry C. Thode, Jr.

Reference

Pearson, E.S. (1935). A comparison of P2 and Mr. Geary's &n criteria.Biometrika 27, 333-352.

Copyright © 2002 by Marcel Dekker, Inc. All Rights Reserved.

CONTENTS

Preface

1. Introduction

TESTING FOR UNIVARIATE NORMALITY

2. Plots, Probability Plots and Regression Tests

3. Test Using Moments

4. Other Tests for Univariate Normality

5. Goodness of Fit Tests

6. Tests for Outliers

7. Power Comparisons for Univariate Tests for Normality

8. Testing for Normality with Censored Data

TESTING FOR MULTIVARIATE NORMALITY

9. Assessing Multivariate Normality

Copyright © 2002 by Marcel Dekker, Inc. All Rights Reserved.

10. Testing for Multivariate Outliers

ADDITIONAL TOPICS

11. Testing for Normal Mixtures

12. Robust Estimation of Location and Scale

13. C o m p u t a t i o n al I s s u es

Appendix A: Data Sets Used in ExamplesAppendix B: Parameter and Critical ValuesAppendix C: Function Optimization Computer Subroutine

Variable Metric Method

Copyright © 2002 by Marcel Dekker, Inc. All Rights Reserved.