Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
The IMA Volumes in Mathematics
and its Applications
Volume 34
Series Editors A vner Friedman Willard Miller, Jr.
Institute for Mathematics and its Applications
IMA The Institute for Mathematics and its Applications was established by
a grant from the National Science Foundation to the University of Minnesota in 1982. The IMA seeks to encourage the development and study of fresh mathematical concepts and questions of concern to the other sciences by bringing together mathematicians and scientists from diverse fields in an atmosphere that will stimulate discussion and collaboration.
The IMA Volumes are intended to involve the broader scientific community in this process.
A vner Friedman, Director Willard Miller, Jr., Associate Director
* * * * * * * * * * IMA PROGRAMS
1982-1983 Statistical and Continuum Approaches to Phase Transition 1983-1984 Mathematical Models for the Economics of
Decentralized Resource Allocation 1984-1985 Continuum Physics and Partial Differential Equations 1985-1986 Stochastic Differential Equations and Their Applications 1986-1987 Scientific Computation 1987-1988 Applied Combinatorics 1988-1989 Nonlinear Waves 1989-1990 Dynamical Systems and Their Applications 1990-1991 Phase Transitions and Free Boundaries
* * * * * * * * * * SPRINGER LECTURE NOTES FROM THE IMA:
The Mathematics and Pllysics of Disordered Media
Editors: Barry Hughes and Barry Ninham (Lecture Notes in Math., Volume 1035, 1983)
Orienting Polymers
Editor: J .L. Ericksen (Lecture Notes in Math., Volume 1063, 1984)
New Perspectives in Thermodynamics
Editor: James Serrin (Springer-Verlag, 1986)
Models of Economic Dynamics
Editor: Hugo Sonnenschein (Lecture Notes in Econ., Volume 264, 1986)
Werner Stahel Sanford Weisberg Editors
Directions in Robust Statistics and Diagnostics
Part II
With 53 Illustrations
Springer-Verlag New York Berlin Heidelberg London Paris Tokyo Hong Kong Barcelona
Werner Stahel Seminar fUr Statistik Swiss Federal Institute of Technology 8092 ZUrich, Switzerland
Sanford Weisberg Department of Mathematics University of Minnesota St. Paul, MN 55108 USA
Mathematics Subject Classification: 62F, 62F35
Library of Congress Cataloging-in-Publication Data Directions in robust statistics and diagnostics / Werner Stahel.
Sanford Weisberg, editors. p. cm. -- (The IMA volumes in mathematics and its
applications; v. 33-34.) Includes bibliographical references and index. ISBN-13:978-1-4612-8772-8 e-ISBN-13:978-1-4612-4444-8 DOl. 10. 1007/978-1-4612-4444-8
I. Robust statistics--Congresses, 2. Mathematical statistics-Congress. I. Stahel, Werner II. Weisberg, Sanford, 1947-III. Title; Diagnostics. IV. Series. QA276.A1D57 1991 519.5--dc20 91-9205
Printed on acid-free paper.
© 1991 Springer-Verlag New York, Inc. All rights reserved. This work may not be translated or copied in whole without the written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimlar methodology now known or hereafter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this pUblication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone. Permission to photocopy for internal or personal use, or the internal or personal use of specific clients, is granted by Springer-Verlag New York, Inc. for libraries registered with the Copyright Clearance Center (CCC), provided that the base fee of $0.00 per copy, plus $0.20 per page is paid directly to CCC, 21 Congress SI., Salem, MA 01970, USA. Special requests should be addressed directly to SpringerVerlag New York, 175 Fifth Avenue, New York, NY 10010, USA.
9 8 765 4 3 2 I
ISBN-13:978-1-4612-8772-8
The IMA Volumes in Mathematics and its Applications
Current Volumes:
Volume 1: Homogenization and Effective Moduli of Materials and Media
Editors: Jerry Ericksen, David Kinderlehrer, Robert Kohn, J.-L. Lions
Volume 2: Oscillation Theory, Computation, and Methods of Compensated Compactness
Editors: Constantine Dafermos, Jerry Ericksen, David Kinderlehrer, Marshall Slemrod
Volume 3: Metastability and Incompletely Posed Problems
Editors: Stuart Antman, Jerry Ericksen, David Kinderlehrer, Ingo Muller
Volume 4: Dynamical Problems in Continuum Physics
Editors: Jerry Bona, Constantine Dafermos, Jerry Ericksen, David Kinderlehrer
Volume 5: Theory and Applications of Liquid Crystals
Editors: Jerry Ericksen and David Kinderlehrer
Volume 6: Amorphous Polymers and Non-Newtonian Fluids
Editors: Constantine Dafermos, Jerry Ericksen, David Kinderlehrer
Volume 7: Random Media
Editor: George Papanicolaou
Volume 8: Percolation Theory and Ergodic Theory of Infinite Particle Systems
Editor: Harry Kesten
Volume 9: Hydrodynamic Behavior and Interacting Particle Systems
Editor: George Papanicolaou
Volume 10: Stochastic Differential Systems, Stochastic Control Theory and Applications
Editors: Wendell Fleming and Pierre-Louis Lions
Volume 11: Numerical Simulation in Oil Recovery
Editor: Mary Fanett Wheeler
Volume 12: Computational Fluid Dynamics and Reacting Gas Flows
Editors: Bjorn Engquist, M. Luskin, Andrew Majda
Volume 13: Numerical Algorithms for Parallel Computer Architectures
Editor: Martin H. Schultz
Volume 14: Mathematical Aspects of Scientific Software
Editor: J.R. Rice
Volume 15: Mathematical Frontiers in Computational Chemical Physics
Editor: D. Truhlar
Volume 16: Mathematics in Industrial Problems
by A vner Friedman
Volume 17: Applications of Combinatorics and Graph Theory to the Biological and Social Sciences
Editor: Fred Roberts
Volume 18: q-Series and Partitions
Editor: Dennis Stanton
Volume 19: Invariant Theory and Tableaux
Editor: Dennis Stanton
Volume 20: Coding Theory and Design Theory Part I: Coding Theory
Editor: Dijen Ray-Chaudhuri
Volume 21: Coding Theory and Design Theory Part II: Design Theory
Editor: Dijen Ray-Chaudhuri
Volume 22: Signal Processing: Part I - Signal Processing Theory
Editors: L. Auslander, F.A. Griinbaum, J.W. Helton, T. Kailath, P. Khargonekar and S. Mitter
Volume 23: Signal Processing: Part II - Control Theory and Applications of Signal Processing
Editors: L. Auslander, F.A. Griinballm, J.W. Helton, T. Kailath, P. Khargonekar and S. Mitter
Volume 24: Mathematics in Industrial Problems, Part 2
by A vner Friedman
Volume 25: Solitons in Physics, Mathematics, and Nonlinear Optics
Editors: Peter J. Olver and David H. Sattinger
Volume 26: Two Phase Flows and Waves
Editors: Daniel D. Joseph and David G. Schaeffer
Volume 27: Nonlinear Evolution Equations that Change Type
Editors: Barbara Lee Keyfitz and Michael Shearer
Volume 28: Computer Aided Proofs in Analysis
Editors: Kenneth R. Meyer and Dieter S. Schmidt
Volume 29: Multidimensional Hyperbolic Problems and Computations
Editors: James Glimm and Andrew Majda
Volume 31: Mathematics in Industrial Problems, Part 3
by A vner Friedman
Volume 32: Radar and Sonar, Part 1
by Richard Blahut, Willard Miller, Jr. and Calvin Wilcox
Volume 33: Directions in Robust Statistics and Diagnostics: Part I
Editors: Werner A. Stahel and Sanford Weisberg
Volume 34: Directions in Robust Statistics and Diagnostics: Part II
Editors: Werner A. Stahel and Sanford Weisberg
Forthcoming Volumes:
1988-1989: Nonlinear Waves
Microlocal Analysis and Nonlinear Waves
Summer Program 1989: Robustness, Diagnostics, Computing and Graphics in Statistics
Computing and Graphics in Statistics
1989-1990: Dynamical Systems and Their Applications
Patterns and Dynamics in Reactive Media
Dynamical Issues in Combustion Theory
Twist Mappings and Their Applications
Dynamical Theories of 'furbulence in Fluid Flows
Nonlinear Phenomena in Atmospheric and Oceanic Sciences
Chaotic Processes in the Geological Sciences
Summer Program 1990: Radar/Sonar
Radar and Sonar, Part 2
Summer Program 1990: Time Series in Time Series Analysis
Time Series (2 volumes)
1990-1991: Pbase Transitions and Free Boundaries
On the Evolution of Phase Boundaries
Shock Induced 'fransitions and Phase Structures
Microstructure and Phase Transitions
FOREWORD
This IMA Volume in Mathematics and its Applications
DIRECTIONS IN ROBUST STATISTICS AND DIAGNOSTICS
is based on the proceedings of the first four weeks of the six week IMA 1989 summer program "Robustness, Diagnostics, Computing and Graphics in Statistics". An important objective of the organizers was to draw a broad set of statisticians working in robustness or diagnostics into collaboration on the challenging problems in these areas, particularly on the interface between them. We thank the organizers of the robustness and diagnostics program Noel Cressie, Thomas P. Hettmansperger, Peter J. Huber, R. Douglas Martin, and especially Werner Stahel and Sanford Weisberg who edited the proceedings.
A vner Friedman
Willard Miller, Jr.
PREFACE
Central themes of all statistics are estimation, prediction, and making decisions under uncertainty. A standard approach to these goals is through parametric modelling. Parametric models can give a problem sufficient structure to allow standard, well understood paradigms to be applied to make the required inferences. If, however, the parametric model is not completely correct, then the standard inferential methods may not give reasonable answers. In the last quarter century, particularly with the advent of readily available computing, more attention has been paid to the problem of inference when the parametric model used is not correctly specified. Robust procedures and diagnostic methods form two approaches to this general problem. In robust statistics, one seeks new inferential methods that are rather insensitive to, or robust against, certain types of failures in the parametric model, so good answers are obtained even if some assumptions are only approximately true. Diagnostics have traditionally taken a somewhat different view. Rather than modifying the fitting method, diagnostics condition on the fit using standard methods, to attempt to diagnose incorrect assumptions, allowing the analyst to modify them and refit under the new set of assumptions. There is clearly a common ground for both robust methods and diagnostics, since both are concerned with allowing an analyst to make sensible inferences even if a correct model is not known beforehand.
On this basis, a conference on both these topics was organized as part of the Summer 1989 Program of the Institute for Mathematics and Its Applications at the University of Minnesota. Most of the papers in these volumes present written versions of talks given at that conference. They cover approaches to robust statistics and to diagnostics as well as overviews and presentations of specific methods for specific models. We hope that these volumes will allow the reader to gain an overview of large parts of the research activities in the two fields.
Much of both robust estimation and diagnostics finds its beginnings in the work of John W. Thkey. In his presentation at the conference, Tukey made it clear that there is little room for antagonism between "opposing" robustness and diagnostic camps, and that the methodologies are largely complementary. Robust estimation, which has been largely concerned with estimation when the error distribution is not completely known, " ... makes unnecessary getting the stochastic part of the model right." Diagnostics, on the other hand, can " ... help to make the functional part of the model right." Between the two, the analyst may have several powerful tools to help in modelling. In the same spirit, Peter Huber writes, "Robustness and diagnostics are complementary" as the first heading of his paper in this volume.
The two fields have developed in different ways. In robust statistics, new procedures have been derived from theoretical considerations, but they have not found their way into widespread application. This lack of acceptance has been a continuing topic in many informal discussions before and during the meeting. Short written
contributions to this theme have been collected by one of us (W. A. Stahel, "Robust Statistics: From an Intellectual Game to a Consumer Product", IMA Preprint Series, IMA, Minneapolis, August 1989). Tukey presents his view, which was a basis for the discussions, in his paper on "Consumer Datesware". Diagnostics, on
xii
the other hand, have been designed to supplement standard methology with both graphical and non-graphical procedures. Many diagnostics, particularly graphical ones, have been generally included in common computing packages. A theoretical basis for some diagnostic methods, however, has been a recent development, and is a topic of several of the papers in this volume.
Many of the papers concerned with robustness use the two most well-known approaches to robustness against deviations from the assumed distribution, which are described in the books by P. J. Huber ("Robust Statistics", Wiley, N.Y., 1981) and by F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel ("Robust Statistics: The Approach Based on Influence Functions", Wiley, N.Y., 1986). A more recent approach is described in Morgenthaler's paper. Rieder studies an approach to testing. In the field of extremely robust estimators, called high breakdown point procedures, there has been much recent activity, and this is reflected in the papers by Lopuhaii, Maronna & Yohai, Martin & Yohai, Rousseeuw & Bassett, Rousseeuw & van Zomeren, and Tyler. Apart from distributional assumptions, statistical models include a statement of independence or known dependence structure and of equal distribution, possibly up to a normalizing parameter or the like. The consequences of deviations from the first type of assumptions are described by Kiinsch and Beran & Ghosh. The second topic is covered by Nanayakkara & Cressie and White & Strinchcombe.
Among specific models, linear regression is the prototype which continues to attract most interest. Inference techniques can be based on the two well-known classes of M-estimators, see Dollinger & Staudte and Markatou, Stahel & Ronchetti, or of R-estimators, see Hettmansperger & Naranjo and McKean & Sheather. Yohai, Stahel & Zamar propose a specific procedure which they hope to be useful for estimation and inference on a general-purpose level. Field & Ronchetti present a general technique for finding good approximations to distributions of M-type statistics in samll samples. Estimation of multivariate location and scatter is the basis for much of multivariate statistics and has also received considerable attention, as shown by the contributions of Lopuhaii, Rousseeuw & van Zomeren, and Tyler. Marazzi and Neykov & Neytchev discuss the computation ofM- and high breakdown point estimates for regression and multivariate location and scatter. In the time series context, Martin & Yohai describe a very robust procedure for autoregressive models. Akritas & Zubovic survey research on robustness in survival analysis. Finally, Boente & Fraiman apply M-estimation to the problem of nonparametric regression or smoothing.
Rather than going into more detail here, we refer to the "Mixed Questions and Comments" by Hampel and to the survey of "Research Directions in Robust Statistics" by Stahel with its extensive reference list.
Eleven papers in this volume are primarily concerned with diagnostics. Three papers, by Portnoy, Simonoff, and Atkinson & Weisberg, consider quite different approaches to finding multiple outliers. Two papers by Schall & Dunne and by Ledolter consider diagnostics for time series models.
Three papers are concerned with more theoretical and general approaches to di-
xiii
agnostics. Lawrance compares the two standard approaches to diagnostics, through case deletion and local perturbations. Geisser develops diagnostics from a Bayesianpredictivist perspective. Tsai & Wu consider comparison of approximations that arise in diagnostic analysis for relatively complicated models.
The two remaining papers consider diagnostic issues that are different from the preceeding papers. O'Brien presents the approach to diagnostics that is included in the expert system front end GLIMPSE to the program GLIM. Finally, Cook & Weisberg discuss added variable plots, and their relationship to general graphical and diagnostic issues.
Two contributions to the interplay between robustness and diagnostics are Tukey's "Graphical Displays for Alternate Regression Fits" (with different robust fits in mind), and the "Regression Diagnostics for Rank Based Methods" by McKean, Sheather & Hettmansperger. We hope that the workshop and these Proceedings will stimulate further research in this direction.
A brief index is included at the end of each volume for ease of reference to the main topics of the papers.
These two volumes contain the proceedings of the first part of the 1989 summer program at IMA. The remaining part, on graphics and computing in statistics, will have a separate volume of proceedings. This was the first program in statistics sponsored by the IMA. We are sure that all the participants join us in hoping that it will not be the last. The IMA provides a positive atmosphere conducive to productive interchange of ideas by participants. We are most grateful to all the staff members who make this possible, in particular to Avner Friedman, Director of IMA, and to Willard Miller, Associate Director, for the high standards they set, and to Patricia Brick, Stephan Skogerboe, Kaye Smith, Marise Widmer who collected the papers and did the necessary typing of papers for these volumes. Ram Gnanadesikan and Peter Huber, as members of the IMA board, were instrumental in getting this program started. We would also like to acknowledge the efforts of the organizing committee which included Andreas Buja, Noel Cressie, Thomas P. Hettmansperger, Peter J. Huber, R. Douglas Martin, Werner Stuetzle, Luke Tierney, Paul A. Tukey, Edward Wegman, Allan R. Wilks and ourselves.
Zurich and St. Paul, October 1990 W. A. Stabel and S. Weisberg
CONTENTS
Foreword ....................................................... ix
Preface......................................................... xi
DIRECTIONS IN ROBUST STATISTICS AND DIAGNOSITCS: PART II
Small sample properties of robust analyses of linear models based on R-estimates: A survey. . . . . . . . . . . . . . . . . . . 1
Joseph W. McKean and Simon J. Sheather
Regression diagnostics for rank-based methods II Joseph W. McKean, Simon J. Sheather and Thomas P. H ettmansperger
Robust multivariate spectral analysis of the EEG Luciano Molinari and Guido Dumermuth
21
33
Configural polysampling ........................................ 49 Stephan Morgenthaler
Robustness to unequal scale and other departures from the classical linear model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Nuwan Nanayakkara and Noel Cressie
Unmasking multivariate outliers and leverage points by means of BMDP3R .. . .. . .. . .. .. .. .. .. .. .. . .. . .. .. . . .. . .. . . . . 115
Neyko M. Neykov and Plamen N. Neytchev
Glimpse: An assessor of GLM misspecification .................. 129 Carl M. O'Brien
Regression quantile diagnostics for multiple outliers Stephen Portnoy
145
Robust testing of functionals ................................... 159 Helmut Rieder
Robustness of the p-subset algorithm for regression with high breakdown point...................................... 185
Peter J. Rousseeuw and Gilbert W. Bassett, Jr.
Robust distances: Simulations and cutoff values ................. 195 Peter J. Rousseeuw and Bert C. van Zomeren
Diagnostics for regression-ARMA time series.................... 205 Robert Schall and Timothy T. Dunne
General approaches to stepwise identification of unusual values in data analysis .......................................... 223
Jeffrey S. Simonoff
Research directions in robust statistics. . . . . . . . . . . . . . . . . . . . . . . . . . 243 Werner A. Stahel
xvi
Comparisons between first order and second order approximations in regression diagnostics. . . . . . . . . . . . . . . . . . . . . . . . . 279
Chih-Ling T3ai and Xizhi Wu
Consumer datesware John W. Tukey
Graphical displays for alternate regression fits John W. Tukey
Some issues in the robust estimation
297
309
of multivariate location and scatter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 David E. Tyler
Adaptive efficient weighted least squares with dependent observations .................................... 337
Halbert White and Maxwell Stinchcombe
A procedure for robust estimation and inference in linear regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
Victor Yohai, Werner A. Stahel, and Ruben H. Zamar
Author index....... .... ... ............... ............ ... ... ... . 375
Subject index. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
CONTENTS
DIRECTIONS IN ROBUST STATISTICS AND DIAGNOSITCS: PART I
Foreword
Preface
Survey of robust procedures for survival data Michael G. Akritas and Yvonne Zubovic
Simulated annealing for the detection of multiple outliers using least squares and least median of squares fitting
Anthony C. Atkinson and Sanford Weisberg
Goodness of fit tests and long-range dependence Jan Beran and Sucharita Ghosh
A functional approach to robust nonparametric regression Graciela Boente and Ricardo Fraiman
Added variable plots in linear regression R. Dennis Cook and Sanford Weisberg
Efficiency of reweighted least squares iterates Michael B. Dollinger and Robert G. Stautdte
An overview of small sample asymptotics Christopher A. Field and Elvezio Ronchetti
Diagnostics, divergences and perturbation analysis Seymour Geisser
Some mixed questions and comments on robustness Frank Hampel
Some research directions in rank-based inference Thomas P. Hettmansperger and Joshua D. Naranjo
Between robustness and diagnostics Peter J. Huber
Dependence among observations: Consequences and methods to deal with it
Hans R. K unsch
Local and deletion influence A.J. Lawrance
Outliers in time series analysis: Some comments on their impact and their detection
Johannes Ledolter
Breakdown point and asymptotic properties of multivariate S-estimators and T-estimators: A Summary
Hendrik P. Lopuhaii
Algorithms and programs for robust linear regression Alfio M arazzi
xviii
Robust M -type testing procedures for linear models Marianthi Markatou, Werner A. Stahel and Elvezio Ronchetti
Recent results on bias-robust regression estimates Ricardo A. Maronna and Victor J. Yohai
Bias robust estimation of autoregression parameters R. Douglas Martin and Victor J. Yohai