24
THE FUTURE OF STATISTICS Bradley Efron 1 Max H. Stein Professor, Stanford University, US “Strange, as one gets older you’re expected to know more about the future.” The history of statistics as a recognized discipline divides rather neatly at 1900, the year of Karl Pearson’s chi-square paper. Before then we are still close to the world of Quetelet, where huge census-level data sets are brought to bear on simple but important questions: Are there more male or female births? Is the murder rate rising? Then, as if on cue, the Twentieth Century rings in a focus on small-scale statistics. A team of intellectural giants, Fisher, Neyman, Hotelling, . . . , invent a theory of optimal inference, capable of wringing out every drop of collected information. The questions are still simple: Is treatment A better than treatment B? But the new methods are suited to the kinds of small data sets an individual scientist might collect. What does this have to do with the future of statistics? Quite a bit, perhaps: the Twenty-First Century, again on cue, seems to have initiated a third statistical era. New technologies, exemplified by the microarray, permit scientists to collect their own huge data sets. But this is not a return to the age of Quetelet. The flood of data is now accompanied by a flood of questions, perhaps thousands of them, that the statistician is charged with answering together; not at all the setting Fisher et al. had in mind. As a cruder summary of my already crude statistical history, we have 19th Century: Large data sets, simple questions 20th Century: Small data sets, simple questions 21st Century: Large data sets, complex questions The future of statistics, or at least the next large chunk of future, will be preoc- cupied, I believe, with problems of large-scale inference raised in our revolutionary scientific environment. For example, how should one analyze 10, 000 related hy- pothesis tests or 100, 000 correlated estimates at the same time? 1 Past President, American Statistical Association (2004), Past President, Institute of Mathe- matical Statistics (1987—1988), Founding Editor, The Annals of Applied Statistics (2006—). Pro- fessor Efron is member of American Academy of Arts and Sciences (1983) and National Academy of Sciences (1985). He has been awarded the Ford Prize, Mathematical Association of America (1978), MacArthur award (1983), Wilks Medal, American Statistical Association (August 1990), Fisher Prize, Committee of Presidents of Statistical Societies (July 1996), Parzen Prize for Statis- tical Innovation, Texas A&M University (1998), Noether Prize, American Statistical Association (2006). On May 29, 2007, he was awarded the National Medal of Science, the highest scientific honor by the United States, for his exceptional work in the field of Statistics (especially for his inventing of the bootstrapping methodology). 1

Future of Statistics Bradley Efron - yoshizoe-stat.jp · experiment mostly had quantitative levels as that emphasis shifted from the esti-mation of factorial effects to the response

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

THE FUTURE OF STATISTICS

Bradley Efron 1

Max H. Stein Professor, Stanford University, US

“Strange, as one gets older you’re expected to know more about the future.”

The history of statistics as a recognized discipline divides rather neatly at1900, the year of Karl Pearson’s chi-square paper. Before then we are still closeto the world of Quetelet, where huge census-level data sets are brought to bearon simple but important questions: Are there more male or female births? Is themurder rate rising? Then, as if on cue, the Twentieth Century rings in a focus onsmall-scale statistics. A team of intellectural giants, Fisher, Neyman, Hotelling,. . . , invent a theory of optimal inference, capable of wringing out every drop ofcollected information. The questions are still simple: Is treatment A better thantreatment B? But the new methods are suited to the kinds of small data sets anindividual scientist might collect.

What does this have to do with the future of statistics? Quite a bit, perhaps:the Twenty-First Century, again on cue, seems to have initiated a third statisticalera. New technologies, exemplified by the microarray, permit scientists to collecttheir own huge data sets. But this is not a return to the age of Quetelet. The floodof data is now accompanied by a flood of questions, perhaps thousands of them,that the statistician is charged with answering together; not at all the setting Fisheret al. had in mind.

As a cruder summary of my already crude statistical history, we have

19th Century: Large data sets, simple questions20th Century: Small data sets, simple questions21st Century: Large data sets, complex questions

The future of statistics, or at least the next large chunk of future, will be preoc-cupied, I believe, with problems of large-scale inference raised in our revolutionaryscientific environment. For example, how should one analyze 10, 000 related hy-pothesis tests or 100, 000 correlated estimates at the same time?

1Past President, American Statistical Association (2004), Past President, Institute of Mathe-matical Statistics (1987—1988), Founding Editor, The Annals of Applied Statistics (2006—). Pro-fessor Efron is member of American Academy of Arts and Sciences (1983) and National Academyof Sciences (1985). He has been awarded the Ford Prize, Mathematical Association of America(1978), MacArthur award (1983), Wilks Medal, American Statistical Association (August 1990),Fisher Prize, Committee of Presidents of Statistical Societies (July 1996), Parzen Prize for Statis-tical Innovation, Texas A&M University (1998), Noether Prize, American Statistical Association(2006). On May 29, 2007, he was awarded the National Medal of Science, the highest scientifichonor by the United States, for his exceptional work in the field of Statistics (especially for hisinventing of the bootstrapping methodology).

1

STATISTICS – AN OVERVIEW

David Hand 1

President of the Royal Statistical SocietyProfessor, Department of Mathematics

Imperial College, London, UK

One can define statistics in various ways. My favourite definition is bipartite:statistics is both the science of uncertainty and the technology of extracting infor-mation from data. This definition captures the two aspects of the discipline: thatit is about understanding (and indeed manipulating) chance, and also about col-lecting and analysing data to enable us to understand the world around us. Morespecifically, of course, statistics can have different aims, including prediction andforecasting, classification, estimation, description, summarisation, decision-making,and others.

Statistics has several roots, which merged to form the modern discipline.These include (i) the theory of probability, initially formalised around the middle ofthe seventeenth century in attempts to understand games of chance, and then put ona sound mathematical footing with Kolmogorov’s axioms around 1930; (ii) surveysof people for governmental administrative and economic purposes, as well as workaimed at constructing life tables for insurance purposes; and (iii) the developmentof arithmetic methods for coping with measurement errors in areas like astronomyand mechanics, by people such as Gauss, in the eighteenth and nineteenth centuries.

This diversity of the roots of statistics has been matched by the changingnature of discipline. This is illustrated by, for example, the papers which haveappeared in the journal of the Royal Statistical Society (the journal was launchedin 1838). In the earlier decades, there was a marked emphasis on social matters,which gradually gave way around the turn of the century, to more mathematicalmaterial. The first half of the twentieth century then saw the dramatic developmentof deep and powerful ideas of statistical inference, which continue to be refined tothe present day. In more recent decades, however, the computer has had an equallyprofound impact on the discipline. Not only has this led to the development ofentirely new classes of methods, it has also put powerful tools into the hands ofstatistically unsophisticated users – users who do not understand the deep mathe-matics underlying the tools. As might be expected, this can be a mixed blessing:powerful tools in hands which understand and know how to use them properly canbe a tremendous asset, but those same tools in hands which can misapply themmay lead to misunderstandings.

Although the majority of statisticians are still initially trained in university1David Hand is Professor of Statistics at Imperial College, London. He previously held the

Chair of Statistics at the Open University. Professor Hand is a Fellow of the Royal StatisticalSociety and of the British Academy, an Honorary Fellow of the Institute of Actuaries, and aChartered Statistician. He is a past-president of the International Federation of ClassificationSocieties, and was president of the Royal Statistical Society for the 2008-2009 term, and again in2010.

1

COMPONENTS OF STATISTICS

Sir Clive William John Granger1

(September 4, 1934 – May 27, 2009)Formerly Professor Emeritus at the University of California, San Diego, USA

Winner of the Nobel Memorial Prize in Economic Sciences in 2003

The two obvious sub-divisions of statistics are: a. Theoretical Statistics and b. PracticalStatistics.

1. The theoretical side is largely based on a mathematical development of probabilitytheory (see Probability theory – outline) applicable to data, particularly theasymptotic properties of estimates (see Properties of estimators) which leadto powerful theorems such as the Central Limit Theorem. The aim is to putmany practical approaches to data analysis (see also Categorical data ana-lysis; Multivariate data analysis – overview; Exploratory data analysis;Functional data analysis) on a sound theoretical foundation and to develop the-orems about the properties of these approaches. The theories are usually based ona number of assumptions that may or may not hold in practice.

2. Practical statistics considers the analysis of data, how the data can be summarizedin useful fashions, and how relationships between sets of data from different variablescan be described and interpreted. The amount and the quality of the data (seeData quality) that is available are essential features in this area. On occasionsdata may be badly constructed or terms may be missing which makes analysis morecomplicated.

Descriptive statistics include means, variances, histograms, correlations, and estimatesof quantiles, for example. There are now various types of statistics depending on the areaof application. General statistics arose from considerations of gambling (see Statisticsand gambling), agriculture (see Statistics in agriculture; Analysis of multivariateagricultural data), and health topics (see Statistical methods in medical research;Medical statistics) but eventually a number of specialized areas arose when it wasrealized that these areas contained special types of data which required their own methodsof analysis. Examples are:

1. Biometrics (see Biostatistics), from biological data which required different formsof measurement and associated tests;

2. Econometrics, for which variables may or may not be related with a time gap;data can be in the form of time series (particularly in economies and finance) orin large panels (see Panel data) in various parts of economics. The techniquesdeveloped over a wide range and the ideas have spread into other parts of statistics.

1 In 2003 Professor Granger was awarded the Nobel Memorial Prize in Economic Sciences (with Pro-fessor Robert Engle) for methods of analyzing economic time series with common trends (cointegration).Granger was knighted in 2005.

Professor Granger had sent his contributed entry on June 2 2008, excusing himself for not writing abit longer, ”Lexicon” kind of paper: ”I have never written anything for a ”Lexicon” before and so havefailed in my attempt to be helpful, but I do attach a pay or so. I wish you good luck with your effort”.We are immensely thankful for his unselfish contribution to the prosperity of this project.

1

DESIGN OF EXPERIMENTS:A PATTERN OF PROGRESS

David R. Cox 1

Professor, Nuffield College, Oxford OX1 1NF, UK

The preceding article (Hinkelman, 2010) sets out the basic statistical princi-ples of experimental design. This supplementary note comments on the historicaldevelopment of the subject.

Careful experimentation has a long history, perhaps especially in the physicalsciences. There is , however, a long history also of experimentation in fields asdiverse as agriculture and clinical medicine. The first systematic discussion ofexperimental design in the presence of substantial haphazard variation seems to bethat of Fisher (1926), later developed in his book (Fisher, 1935). He set out fourprinciples:

• error control, for example by some form of matching to compare like with like

• independent replication to improve precision and allow its estimation

• randomization to achieve a number of aims, notably avoidance of selectionbiases

• factorial design to improve the efficiency of experimentation and to allow theexploration of interactions.

The relative importance of these four principles varies between subject-matterfields. This accounts to some extent for differences in how the subject has developedwhen directed towards, say, agricultural field trials as contrasted with some otherfields of application.

In the period up to 1939 these ideas were extensively developed, notably byFishers friend and colleague, F. Yates, whose monograph (Yates, 1937) is a littleknown masterpiece of the statistical literature. The focus and impact of this workwas primarily but by no means exclusively agricultural.

In the 1950s a strand of new ideas entered from the chemical process indus-tries notably with the work of G.E.P. Box and his associates; see, for example, Boxand Draper (1987). The differences were not so much that factors in a factorialexperiment mostly had quantitative levels as that emphasis shifted from the esti-mation of factorial effects to the response surface of mean outcome considered as asmooth function of the factor level. This led to a richer family of designs and to an

1Professor Cox was knighted by Queen Elizabeth II in 1985 and became an Honorary Fellowof the British Academy in 1997. He has served as President of the Bernoulli Society (1979—81),of the Royal Statistical Society (1980—82), and of the International Statistical Institute (1995—1997). He has been awarded the Guy Medal in Silver, Royal Statistical Society (1961), Guy Medalin Gold, Royal Statistical Society (1973), Weldon Memorial Prize, University of Oxford (1984),Kettering Prize and Gold Medal for Cancer Research (1990), Marvin Zelen leadership Award,Harvard University (1998). Professor Cox holds 20 honorary doctorates.

1

THE PRINCIPLES UNDERLYING ECONOMETRIC

ESTIMATORS FOR IDENTIFYING CAUSAL EFFECTS1

James J. Heckman2

Winner of the Nobel Memorial Prize in Economic Sciences in 2000

Henry Schultz Distinguished Service Professor of Economics

The University of Chicago and University College Dublin

This paper reviews the basic principles underlying the identification of con-ventional econometric evaluation estimators for causal effects and their recentextensions. Heckman [2008] discusses the econometric approach to causality andcompares it to conventional statistical approaches. This paper considers alternativemethods for identifying causal models.

The paper is in four parts. The first part presents a prototypical economicchoice model that underlies econometric models of causal inference. It is a frame-work that is useful for analyzing and motivating the economic assumptions un-derlying alternative estimators. The second part discusses general identificationassumptions for leading econometric estimators at an intuitive level. The thirdpart elaborates the discussion of matching in the second part. Matching is widelyused in applied work and makes strong informational assumptions about whatanalysts know relative to what the people they analyze know. The fourth partconcludes.

1 A Prototypical Policy Evaluation Problem

Consider the following prototypical policy problem. Suppose a policy is proposedfor adoption in a country. It has been tried in other countries and we knowoutcomes there. We also know outcomes in countries where it was not adopted.From the historical record, what can we conclude about the likely effectiveness of

1University of Chicago, Department of Economics, 1126 E. 59th Street, Chicago IL 60637, USA. Thisresearch was supported by NSF: 97-09-873, 00-99195, and SES-0241858 and NICHD: R01-HD32058-03.I thank Mohan Singh, Sergio Urzua, and Edward Vytlacil for useful comments.

2Professor Heckman shared the Nobel Memorial Prize in Economics in 2000 with Professor DanielMcFadden for his development of theory and methods for analyzing selective samples. ProfessorHeckman has also received numerous awards for his work, including the John Bates Clark Award ofthe American Economic Association in 1983, the 2005 Jacob Mincer Award for Lifetime Achievementin Labor Economics, the 2005 University College Dublin Ulysses Medal, the 2005 Aigner award fromthe Journal of Econometrics, and Gold Medal of the President of the Italian Republic, Awarded by theInternational Scientific Committee, in 2008. He holds six honorary doctorates.

1

ANDERSON-DARLING TESTS OF GOODNESS-OF-FIT

Theodore W. Anderson 1 2

Professor of Statistics and Economics, Emeritus, Stanford University, USA

1 Introduction.

A “goodness-of-fit” test is a procedure for determining whether a sample of nobservations, x1, . . . , xn, can be considered as a sample from a given specified dis-tribution. For example, the distribution might be a normal distribution with mean0 and variance 1. More generally, the specified distribution is defined as

F (x) =

x∫−∞

f(y)dy, −∞ < x < ∞ , (1)

where f(y) is a specified density. This density might be suggested by a theory, orit might be determined by a previous study of similar data.

When X is a random variable with distribution function F (x) = Pr {X ≤ x} ,then U = F (X) is a random variable with distribution function

Pr {U ≤ u} = Pr {F (X) ≤ u} = u, 0 ≤ u ≤ 1. (2)

The model specifies u1 = F (x1), . . . , un = F (xn) as a sample from the distribution(2), that is, the standard uniform distribution on the unit interval [0, 1] writtenU(0, 1).

A test of the hypothesis that x1, . . . , xn is a sample from a specified distri-bution, say F 0(x), is equivalent to a test that u1 = F 0(x1), . . . , un = F 0(xn) is asample from U(0, 1). Define the empirical distribution function as

Fn(x) =k

n, −∞ < x < ∞, (3)

if k of (x1, . . . , xn) are ≤ x. A goodness-of-fit test is a comparison of Fn(x) withF 0(x). The hypothesis H0 : F (x) = F 0(x), −∞ < x < ∞, is rejected if Fn(x) isvery different from F 0(x). “Very different” is defined here as

W 2n = n

∞∫−∞

[Fn(x) − F 0(x)

]2ψ

[F 0(x)

]dF 0(x)

1The assistance of Michael A. Stephens is gratefully acknowledged.2Past President, Institute of Mathematical Statistics (1963), Vice President, American Sta-

tistical Association (1971—73), Fellow of the American Academy of Arts and Sciences (elected1974), Member of the National Academy of Sciences (elected 1976). Professor Anderson hasbeen awarded the R. A. Fisher Award of Committee of Presidents of Statistical Societies (1985)and Samuel S. Wilks Memorial Medal, American Statistical Association (1988). He holds fourhonorary doctorates.

1

FUZZY SETS — AN INTRODUCTION

Madan Lal Puri 1

Professor Emeritus, Mathematics Department, Indiana University, USA

1. Introduction

In everyday life we often deal with imprecisely defined properties or quantities—e.g. “a few books”, “a long story”, “a popular teacher”, “a tall man”, etc. Moreoften than not, the classes of objects which we encounter in the real physical worlddo not have precisely defined criteria of membership. For example, consider theclass of animals. This class clearly includes dogs, horses, birds, etc. as its members,and clearly excludes rocks, fluids, plants, etc. However, such objects as starfish,bacteria, etc. have an ambiguous status with respect to the class of animals. Thesame kind of ambiguity arises in the case of a number such as 10 in relation to the“class” of all numbers which are much greater than 1.

Clearly, the class of all real numbers which are much greater than 1, or “theclass of tall men” do not constitute classes in the usual mathematical sense ofthese terms. Yet, the fact remains that such imprecisely defined “classes” play animportant role in human thinking, particularly, in the domain of pattern recogni-tion, communication of information, decision theory, control theory and medicaldiagnosis, among others.

The purpose of this note is to provide in a preliminary way some of thebasic properties and implications of a concept which is being used more and morein dealing with the type of “classes”mentioned above. The concept in questionis that of a “fuzzy set” with a continuum of grades of membership, the conceptintroduced by Zadeh (1965) in order to allow imprecisely defined notions to beproperly formulated and manipulated.

Over the past 20–25 years there has been a tremendous growth of literatureon fuzzy sets amounting by now to over 2000 papers and several textbooks; thereis even a journal devoted to this subject.

This paper is intended to provide a brief survey of some of the basic conceptsof fuzzy sets and related topics.

1Professor Puri was ranked the fourth most prolific statistician in the world for his writings

in the top statistical journals in a 1997 report by the Natural Sciences and Engineering Research

Council of Canada. Among statisticians in universities which do not have separate departments of

statistics, Puri was ranked number one in the world by the same report. Puri has received a great

many honours for his outstanding contributions to statistics and we mention only a few. Professor

Puri twice received the Senior U.S. Scientist Award from Germany’s Alexander von HumboldtFoundation, and he was honored by the German government in recognition of past achievements inresearch and teaching. Madan Puri has been named the recipient of the 2008 Gottfried E. NoetherSenior Scholar Award (an annual, international prize honoring the outstanding statisticians acrossthe globe), for “outstanding contributions to the methodology and/or theory and teaching ofnonparametric statistics that have had substantial, sustained impact on the subject, its practicalapplications and its pedagogy”. According to Professor Puri, his greatest honor came in 2003 whenthe International Science Publishers published “Selected Collected Works of Madan L. Puri”, aseries of three volumes, each containing about 800 pages.

1

Bayesian Analysis or Evidence Based Statistics

D.A.S. Fraser 1

Professor, Department of StatisticsUniversity of Toronto, Canada M5S 3G3

Introduction

The original Bayes proposal leads to likelihood and confidence for many sim-ple examples. More generally it gives approximate confidence but to achieve exactconfidence reliability it needs refinement of the argument and needs more thanjust the usual minimum of the likelihood function from observed data. A generalBayes approach provides a flexible and fruitful methodology that has blossomedin contrast to the widely-based long-standing frequentist testing with focus on the5% level. We examine some key events in the evolution of the Bayes approachpromoted as an alternative to the present likelihood based frequentist analysis ofdata with model, the evidence-based approach of central statistics. And we are ledto focus on the bane of Bayes: parameter curvature.

1. Bayes, 1763

Bayes (1763) examined the Binomial model f(y; θ) =(nθ

)θy(1 − θ)n−y and

proposed the flat prior π(θ) = 1 on [0, 1]. Then with data y0 he used a lemma fromprobability calculus to derive the posterior π(θ|y0) = cθy0

(1−θ)n−y0on [0, 1]. And

then for an interval say (θ, 1) he calculated the integral of the posterior,

s(θ) =∫ 1

θ

θy0(1 − θ)n−y0

dθ/

∫ 1

0

θy0(1 − θ)n−y0

and referred to it as probability that the parameter belonged to the interval (θ, 1).Many endorsed the proposed calculation and many disputed it.

As part of his presentation he used an analogy. A ball was rolled on a leveltable, perhaps an available billiard table, and was viewed as having equal proba-bility of stopping in any equal sized area. The table was then divided conceptuallyby a North-South line through the position where the ball stopped, with area θ tothe West and (1− θ) to the East. The ball was then rolled n further times and thenumber y0 of time that it stopped left of the line observed. In the analogy itself,the posterior probability calculation given data seems entirely appropriate.

1Among many awards, Professor Fraser received the First Gold Medal, Statistical Society ofCanada (1985), R.A. Fisher Award and Prize, American Statistical Association (1990), and GoldMedal, Islamic Statistical Society (2000).

1

CHERNOFF-SAVAGE THEOREM

Herman Chernoff 1

Professor Emeritus, Department of StatisticsHarvard University, USA

Hodges and Lehmann [1] conjectured in 1956 that the nonparametric com-petitor to the t-test, the Fisher-Yates-Terry-Hoeffding or c1 test [2], was as efficientas the t-test for normal alternatives and more efficient for nonnormal alternatives.

To be more precise, we assume that we have two large samples, of sizes mand n with N = m + n, from two distributions which are the same except for atranslation parameter which differs by an amount δ. To test the hypothesis thatδ = 0 against one sided alternatives, we use a test statistic of the form

TN = m−1N∑

i=1

ENizNi

where zNi is one or zero depending on whether the ith smallest of the N observationsis from the first or the second sample. For example the Wilcoxon test is of the aboveform with ENi = i/N . It was more convenient to represent the test in the form

TN =∫ ∞

−∞JN [HN (x)]dFm(x).

where Fm and Gn are the two sample cdf’s, λN = m/N and HN = λNFm + (1 −λN )Gn. These two forms are equivalent when ENi = JN (i/N).

The proof of the conjecture required two arguments. One was the asymp-totic normality of T when δ �= 0. The Chernoff-Savage theorem[3] establishes theasymptotic normality, under appropriate regularity conditions on JN , satisfied byc1, using an argument where Fm and Gn are approximated by continuous timeGaussian Processes, and the errors due to the approximation are shown to be rel-atively small.

The second argument required a variational result using the Pitman measureof local efficacy of the test of δ = 0, which may be calculated as a function of theunderlying distribution. For distributions with variance 1, the efficiency of the testrelative to the t-test is minimized with a value of 1 for the normal distribution. Itfollows that the c1 test is as efficient as the t-test for normal translation alternativesand more efficient for nonnormal translation alternatives.

References1Professor Chernoff was President of the Institute of Mathematical Statistics (1967—1968)

and is an elected member of both the American Academy of Arts and Sciences and the NationalAcademy of Sciences. He has been honored for his contributions in many ways. He is a recipientof the Townsend Harris Medal and Samuel S. Wilks Medal. He was named Statistician of theYear, Boston Chapter of the ASA (1991). He holds four honorary doctorates.

1

DATA PRIVACY AND CONFIDENTIALITY

Stephen E. Fienberg1

Maurice Falk University Professor, Department of Statistics and MachineLearning Department

Carnegie Mellon University, USA

Aleksandra B. Slavkovic2

Assistant Professor, Department of StatisticsThe Pennsylvania State University, USA

Data privacy is an overarching concern in modern society, as government andnon-government agencies alike collect, archive, and release increasing amounts ofpotentially-sensitive personal data. Data owners or stewards, in the case of statisti-cal agencies, often critically evaluate both the type of data that they make publiclyavailable and the format of the data product releases. The statistical challenge isto discover how to release important characteristics of existing databases withoutcompromising the privacy of those whose data they contain.

Modern databases, however, pose new privacy problems due to the types ofinformation they hold and their size. In addition to traditional types of informa-tion contained in censuses, surveys, and medical and public health studies, con-temporary information repositories store social network data (e.g., cell phone andFacebook data), product preferences (e.g., from commercial vendors), web searchdata, and other statistical information that was hitherto unavailable in digital for-mat. The information in modern databases is also more commercially exploitablethan pure census data (e.g., credit cards, purchase histories, medical history, mo-bile device locations). As the amount of data in the public realm accumulates andrecord-linkage methodologies improve, the threat to confidentiality and privacymagnifies. Repeated database breaches demonstrate that removing obviously iden-tifying attributes such as names is insufficient to protect privacy (e.g., Narayananand Shmatikov (2006); Backstrom et al. (2007)). Even supposed anonymizationtechniques can leak sensitive information when the intruder has modest partialknowledge about the data from external sources (e.g., Coull et al. (2008); Ganta

1Co-Founder of Journal of Privacy and Confidentiality, Past President of the Institute of Math-ematical Statistics (1998-99), Past President of the International Society for Bayesian Analysis(1996-97), Elected member of The National Academy of Sciences, USA. Supported in part by Na-tional Science Foundation grant DMS-0631589 and U.S. Army Research Office Contract W911NF-09-1-0360 to Carnegie Mellon University.

2Supported in part by National Science Foundation grant SES-0532407 to the Department ofStatistics, Pennsylvania State University.

1

ECONOMIC STATISTICS

Yasuto Yoshizoe

Professor, College of Economics, Aoyama Gakuin University, Tokyo, Japan

President of the Japan Statistical Society (2009 — 2010)

President, Statistics Council, Japan

In this article, we regard economic statistics as a branch of applied statisticswhich deals with the following topics: (1) collection of statistics on socioeconomicconditions, (2) compilation of surveys and registered records to produce various eco-nomic indicators, such as consumer price index or GDP, (3) evaluation of economicindicators from the viewpoint of reliability.

Economic statistics is closely related to official statistics, since most of thestatistics of the society and economy are provided by official organizations like na-tional statistical offices or intergovernmental organizations such as United Nations,OECD, and World Bank. On the other hand, economic statistics is different fromeconometrics in a narrow sense. Typically, objective of econometrics lies in develop-ing the theory and its applications to various economic data. In contrast, economicstatistics places more emphasis on quality of data before applying sophisticatedmethods to analyze them. In other words, we are more interested in appropriateinterpretation of the data paying attention to their detailed characteristics.

Here, we describe some typical issues from economic statistics: censuses, sam-ple surveys, index numbers, system of national accounts, followed by an illustrativeexample.

1 Censuses — complete enumeration

Consumers and producers consist of two major components in economics.Correspondingly, two fundamental information of economic statistics are (1) pop-ulation and households, and (2) firms and establishments. U.S. Census Bureau,http://www.census.gov/econ/, provides definitions of firms and establishmentsamong other related concepts.

To construct reliable statistics on these subjects, the complete enumeration isrequired. A census is the procedure of collecting information about all members of apopulation. In many countries, population censuses are carried out periodically. Incontrast, information of firms and establishments are obtained either by statisticalsurveys (so-called economic censuses) or by registered information collected throughsome legal requirement.

The complete enumeration is necessary to provide accurate information forsmall areas. In planning the location of elementary schools, hospitals or care agen-cies for elderly people, local governments need information for small areas. If thecensuses or registration records provide accurate information, local governmentscan depend on them in making important policies.

Another role of the census is that it provides a list of households or firms/establishments.

1

AGGREGATION SCHEMES — REVISIT

Devendra Chhetry

President of the Nepal Statistical Association (NEPSA)Professor and Head, Central Department of Statistics

Tribhuvan University, Kirtipur Campus, Kathmandu, Nepal

Given a data vector x = (x1, x2, . . . , xn) and a weight vector w = (w1, w2, . . . ,wn), there exits three aggregation schemes in the area of statistics which under cer-tain assumptions generate three well-known measures of location: arithmetic mean(AM), geometric mean (GM), and harmonic mean (HM), where it is implicitlyunderstood that the data vector x contains values of a single variable. Among allthese three measures, AM is more frequently used in statistics because of sometheoretical reason. It is well known that AM ≥ GM ≥ HM where equality holdsonly when all components of x are equal.

In recent years, some of these three and a new aggregation scheme are beingpracticed in the aggregation of development or deprivation indicators by extendingthe definition of data vector to a vector of indicators, in the sense that it containsmeasurements of development or deprivation of several sub-population groups ormeasurements of several dimensions of development or deprivation. The measure-ments of development or deprivation are either available in the form of percentagesor need to be transformed in the form of unit free indices. Physical Quality of LifeIndex (Morris 1979), Human Development Index (UNDP 1991), Gender-relatedDevelopment Index (UNDP 1995), Gender Empowerment Measure (UNDP 1995),and Human Poverty Index (UNDP 1997) are some of the aggregated indices ofseveral dimensions of development or deprivation.

In developing countries, aggregation of development or deprivation indicatorsis a challenging task mainly due to two reasons. First, indicators usually displaylarge variations or inequalities in the achievement of development or in the reductionof deprivation across the sub-populations or across the dimensions of developmentor deprivation within a region. Second, during the process of aggregation it is de-sired to incorporate the public aversion to social inequalities or, equivalently, publicpreference for social equalities. Public aversion to social inequalities is essential fordevelopment workers or planners of developing countries for bringing marginalizedsub-populations in the main stream by monitoring and evaluation of the develop-ment works. Motivated by this problem, Anand and Sen (UNDP 1995) introducedthe notion of gender-equality sensitive indicator (GESI).

In societies of equal proportion of female and male population, for example,the AM of 60 and 30 percent of male and female literacy rate is same as that of 50and 40 percent, showing that AM fails to incorporate the public aversion to genderinequality due to the AM ′s built-in problem of perfect substitutability, in the sensethat 10 percentage point decrease in female literacy rate in the former society ascompared to later one is substituted by the 10 percentage point increase in maleliteracy rate. The GM or HM, however, incorporates the public aversion to genderinequality because they do not posses the perfect substitutability property. Instead

1

INVERSE GAUSSIAN DISTRIBUTION

Kalev ParnaPresident of the Estonian Statistical Society

Professor, Head of the Institute of Mathematical StatisticsChair of Probability Theory Department

University of Tartu, Estonia

The inverse Gaussian distribution (IG) (also known as Wald distribution)is a two-parameter continuous distribution given by its density function

f(x; μ, λ) =

√λ

2πx−3/2 exp

{− λ

2μ2x(x − μ)2

}, x > 0.

The parameter μ > 0 is the mean and λ > 0 is the shape parameter. For a randomvariable X with inverse Gaussian distribution we write X ∼ IG(μ, λ).

The inverse Gaussian distribution describes the distribution of the time aBrownian motion with positive drift takes to reach a given positive level. To beprecise, let Xt = νt + σWt be a Brownian motion with drift ν > 0 (here Wt is thestandard Brownian motion). Let Ta be the first passage time for a fixed level a > 0by Xt. Then Ta has inverse Gaussian distibution, Ta ∼ IG(a

ν , a2

σ2 ).

The inverse Gaussian distribution was first derived by E. Schrodinger in 1915.It belongs to a wider family of Tweedie distributions.

Characteristics of IG distributionThe cumulative distribution function of IG equals

F (x) = Φ

(√λ

x

(x

μ− 1))

+ exp(

μ

(−√

λ

x

(x

μ+ 1))

,

where Φ() is c.d.f. of the standard normal distribution. The characteristic functionof IG is

φ(t) = exp

μ

(1 −

√1 − 2μ2it

λ

)}

and the moment generating function is

m(t) = exp

μ

(1 −

√1 − 2μ2t

λ

)}.

Using the latter, the first four raw moments (i.e αn ≡ E(Xn) = ∂nm(t)∂tn |t=0)

1

KHMALADZE TRANSFORMATION

Hira L. Koul 1

President of the Indian Statistical Association (2009—2011)

Professor and Chair, Department of Statistics and ProbabilityMichigan State University, USA

Eustace Swordson

Background. Consider the problem of testing the null hypothesis that a set ofrandom variables Xi, i = 1, . . . , n, is a random sample from a specified continuousdistribution function (d.f.) F . Under the null hypothesis, the empirical d.f.

Fn(x) =1n

n∑i=1

I{Xi ≤ x}

must “agree” with F . One way to measure this agreement is to use omnibus teststatistics from the empirical process

vn(x) =√

n(Fn(x) − F (x)).

The time transformed uniform empirical process

un(t) = vn(x), t = F (x)

is an empirical process based on random variables Ui = F (Xi), i = 1, . . . , n, that areuniformly distributed on [0, 1] under the null hypothesis. Hence, although the con-struction of un depends on F , the null distribution of this process does not dependon F any more (Kolmogorov (1933), Doob (1949)). From this sprang a princi-ple, universally accepted in goodness of fit testing theory, that one should choosetests of the above hypothesis based on statistics A(vn, F ) which can be representedas statistics B(un) just from un. Any such statistic, like, for example, weightedCramer-von Mise statistics

∫v2

n(x)α(F (x))dF (x), or Kolmogorov-Smirnov statis-tics maxx |vn(x)|/α(F (x)), will have a null distribution free from F , and hence thisdistribution can be calculated once and used for many different F – still a verydesirable property in present times, in spite of great advantages in computationalpower. It is called the distribution free property of the test statistic

However, as first clarified by Gikhman (1954) and Kac, Kiefer and Wolfowitz(1955), this property is lost even asymptotically as soon as one is fitting a family ofparametric d.f.’s. More precisely, suppose one is given a parametric family of d.f.’s

1Professor Koul was President of the International Indian Statistical Association (2005—2006).He was awarded a Humboldt Research Award for Senior Scientists (1995). He is Co-Editor inChief: Statistics and Probability Letters, Associate Editor: Applicable Analysis and DiscreteMathematics, Co-Editor: The J. Mathematical Sciences.

1

MIXTURE MODELS

Wilfried Seidel

President of the German Statistical SocietyProfessor, Helmut-Schmidt-Universitat, D-22039 Hamburg, Germany

1 Introduction

Mixture distributions are convex combinations of ”component” distributions.In statistics, these are standard tools for modelling heterogeneity in the sense thatdifferent elements of a sample may belong to different components. However, theymay also be used simply as flexible instruments for achieving a good fit to datawhen standard distributions fail. As good software for fitting mixtures is available,these play an increasingly important role in nearly every field of statistics.

It is convenient to explain finite mixtures (i.e. finite convex combinations) astheoretical models for cluster analysis, but of course the range of applicability isnot at all restricted to the clustering context. Suppose that a feature vector X isobserved in a heterogeneous population, which consists of k homogeneous subpop-ulations, the ”components”. It is assumed that for i = 1, . . . , k, X is distributedin the i-th component according to a (discrete or continuous) density f(x, θi) (the”component density”), and all component densities belong to a common paramet-ric family {f(x, θ), θ ∈ Θ}, the ”component model”. The relative proportion ofthe i-th component in the whole population is pi, p1 + · · · + pk = 1. Now supposethat an item is drawn randomly from the population. Then it belongs to the i-thcomponent with probability pi, and the conditional probability that X falls in someset A is Pr (X ∈ A | θi), calculated from the density f(x, θi). Consequently, themarginal probability is

Pr (X ∈ A | P ) = p1 Pr (X ∈ A | θ1) + · · · + pk Pr (X ∈ A | θk)

with densityf(x, P ) = p1f(x, θ1) + · · · + pkf(x, θk), (1)

a ”simple finite mixture” with parameter P = ((p1, . . . , pk), (θ1, . . . , θk)). The com-ponents pi of P are called ”mixing weights”, the θi ”component paramaters”. Forfixed k, let Pk be the set of all vectors P of this type, with θi ∈ Θ and nonnegativemixing weights summing up to one. Then Pk parameterizes all mixtures with notmore than k components. If all mixing weights are positive and component densi-ties are different, then k is the exact number of components. The set of all simplefinite mixtures is parameterized by Pfin, the union of all Pk.

This model can be extendes in various ways. For example, all componentdensities may contain additional common parameters (variance parameters, say),they may depend on covariables (mixtures of regression models), and also the mix-ing weights may depend on covariables. Mixtures on time series models are alsoconsidered. Here I shall concentrate on simple mixtures, as all relevant concepts

1

MULTIVARIATE RANK PROCEDURES :PERSPECTIVES AND PROSPECTIVES

Pranab K. Sen 1

Cary C. Boshamer Professor of Biostatistics and Professor of Statistics andOperations Research

University of North Carolina, Chapel Hill, NC 27599, USA

Developments in multivariate statistical analysis have genesis in the paramet-rics surrounding the multivariate normal distribution in the continuous case whilethe product multinomial law dominates in discrete multivariate analysis. Character-isations of multi-normal distributions have provided a wealth of rigid mathematicaltools leading to a very systematic evolution of mathematical theory laying downthe foundation of multivariate statistical methods. Internal multivariate analysescomprising of principal component models, canonical correlation and factor analy-sis are all based on appropriate invariance structures that exploit the underlyinglinearity of the interrelation of different characteristics, without depending muchon underlying normality, and these tools are very useful in many areas of appliedresearch, such as sociology, psychology, economics, and agricultural sciences. In therecent past, there has been a phenomenal growth of multivariate analysis in medicalstudies, clinical trials and bioinformatics, among others. The role of multinormalityis being scrutinized increasingly in these contexts.

External multivariate analyses pertaining to multivariate analysis of vari-ance (MANOVA) and covariance (MANOCOVA), classification and discrimina-tion, among others, have their roots in the basic assumption of multinormal dis-tribution, providing some optimal. or at least desirable, properties of statisticalinference procedures. Such optimal statistical procedures generally exist only whenthe multinormality assumption holds. Yet, in real life applications, the postula-tion of multinormality may not be tenable in a majority of cases. Whereas in theunivariate case, there are some other distributions, some belonging to the so-calledexponential family of densities and some not, for which exact statistical inferencecan be drawn, often being confined to suitable subclass of statistical procedures.In the multivariate case, alternatives to multinormal distributions are relativelyfew and lack generality. As such, almost five decades ago, it was strongly felt thatstatistical procedures should be developed to bypass the stringent assumption ofmultinormality; this is the genesis of multivariate nonparametrics.

Whereas the classical normal theory likelihood based multivariate analysis ex-ploited affine invariance, leading to some optimality properties, it has some short-

1Professor Sen has more than 600 publications in in Statistics, Probability Theory, StochasticProcesses, and Biostatistics in leading journals in these areas and has supervised more than 80doctoral students (1969—2007). In 2002, he was the Senior Noether Awardee for his lifelongcontributions to nonparametrics and received the Commemoration Medal from the Czech Unionof Physicists and Mathematicians in 1998. He was the Founding (joint) Editor of SequentialAnalysis (1982–1995) and also of Statistics and Decisions (1982–2002). He is the Chief Editor ofSankhya.

1

SADDLEPOINT APPROXIMATIONS

Juan Carlos Abril

President of the Argentinean Statistical Society

Professor, Universidad Nacional de Tucuman andConsejo Nacional de Investigaciones Cientıficas

y Tecnicas (CONICET), Argentinaemail: [email protected]

1 Introduction

It is often required to approximate to the distribution of some statistics whoseexact distribution cannot be conveniently obtained. When the first few momentsare known, a common procedure is to fit a law of the Edgeworth type havingthe same moments as far as they are given. This method is often satisfactory inpractice, but has the drawback that error in the “tail” regions of the distributionare sometimes comparable with the frequencies themselves. Notoriously, theEdgeworth approximation can assume negative values in such regions.

The characteristic function of the statistic may be known, and the difficultyis then the analytical one of inverting a Fourier transform explicitly. It is possibleto show that for some statistics a satisfactory approximation to its probabilitydensity, when it exists, can be obtained nearly always by the method of steepestdescents. This gives an asymptotic expansion in powers of n−1, where n is thesample size, whose dominant term, called the saddlepoint approximation, has anumber of desirable features. The error incurred by its use is O(n−1) as againstthe more usual O(n−1/2) associated with the normal normal approximation.

2 The saddlepoint approximation

Let y = (y1, . . . , yn)′ be a vector of observations of n random variables withjoint density f(y). Suppose that the real random variable Sn = Sn(y) has adensity with respect to Lebesgue measure which depends on integer n > Nfor some positive N . Let φn(z) = E(eizSn) be the characteristic function ofSn where i is the imaginary unit. The cumulant generating function of Sn isψn(z) = log φn(z) = Kn(T ) with T = iz. Whenever the appropriate derivativesexist, let ∂jψn(z)/∂zj denote the jth order derivative evaluated at z = z. Thejth cumulant κnj of Sn, where it exists, satisfies the relation

ijκnj =∂jψn(0)

∂zj. (1)

It is assumed that the derivatives ∂jψn(z)/∂zj exist and are O(n) for all z andj = 1, 2, . . . , r with r ≥ 4. We use here partial derivatives because the functionsinvolved may depend on something else, a parameter vector for example.

Let hn(x) be the density of the statistics Xn = n−1/2 {Sn − E(Sn)}. The

1

THIS IS ROBUST STATISTICS

Peter J. Huber 1

POB 1987250 Klosters, Switzerland

E-mail: [email protected]

Keywords and phrases: optimal robustness, M-estimates, influence func-tion, breakdown point, diagnostics, Bayesian robustness, optimal design, heuris-tics of robustness.

1. Introduction

The term “robust” was introduced into the statistical literature by G. E. P. Box(1953). By then, robust methods such as trimmed means, had been in sporadic usefor well over a century, see for example Anonymous (1821). However, J. W. Tukey(1960) was the first person to recognize the extreme sensitivity of some conventionalstatistical procedures to seemingly minor deviations from the assumptions, andto give an eye-opening example. His example, and his realization that statisticalmethods optimized for the conventional Gaussian model are unstable under smallperturbations were crucial for the subsequent theoretical developments initiated byP. J. Huber (1964) and F. R. Hampel (1968).

In the 1960s robust methods still were considered “dirty” by most. Therefore,to promote their reception in the statistical community it was crucial to mathema-tize the approach: one had to prove optimality properties, as was done by Huber’sminimax results (1964, 1965, 1968), and to give a formal definition of qualitativerobustness in topological terms, as was done by Hampel (1968, 1971). The firstbook-length treatment of theoretical robustness was that by Huber (1981, 2nd edi-tion by Huber and Ronchetti 2009).

2. M -estimates and influence functions

With Huber (1964) we may formalize a robust estimation problem as a game be-tween the Statistician and Nature. Nature can choose any distribution within someuncertainty region, say an ε-contamination neighborhood of the Gaussian distribu-tion (i.e., a fraction ε of the observations comes from an arbitrary distribution).The Statistician can choose any M -estimate, that is, an estimate defined as thesolution θ of an equation of the form

∑ψ(xi, θ) = 0, (1)

1Professor Huber is a fellow of the American Academy of Arts and Sciences. He received aHumboldt Award in 1988. He was a Professor of statistics at ETH Zurich (Switzerland), HarvardUniversity, Massachusetts Institute of Technology, and the University of Bayreuth (Germany).

1

imsart-generic ver. 2007/04/13 file: This*is*Robust*Statistics_Peter*Huber.tex date: February 26, 2010

MONTY HALL PROBLEM - SOLUTION

Richard D. Gill 1

President of the Dutch society for Statistics and Operations Research

Professor, Faculty of ScienceLeiden University, Netherlands

1. INTRODUCTION

The Three Doors Problem, or Monty Hall Problem, is familiar to statisticians asa paradox in elementary probability theory often found in elementary probabilitytexts (especially in their exercises sections). In that context it is usually meantto be solved by careful (and elementary) application of Bayes’ theorem. However,in different forms, it is much discussed and argued about and written about bypsychologists, game-theorists and mathematical economists, educationalists, jour-nalists, lay persons, blog-writers, wikipedia editors.

In this article I will briefly survey the history of the problem and some of theapproaches to it which have been proposed. My take-home message to you, dearreader, is that one should distinguish two levels to the problem.

There is an informally stated problem which you could pose to a friend at aparty; and there are many concrete versions or realizations of the problem, whichare actually the result of mathematical or probabilistic or statistical modelling.This modelling often involves adding supplementary assumptions chosen to makethe problem well posed in the terms of the modeller. The modeller finds thoseassumptions perfectly natural. His or her students are supposed to guess those as-sumptions from various key words (like: “indistinguishable”, “unknown”) strategi-cally placed in the problem re-statement. Teaching statistics is often about teachingthe students to read the teacher’s mind. Mathematical (probabilistic, statistical)modelling is, unfortunately, often solution driven rather than problem driven.

The very same criticism can, and should, be levelled at this very article! Bycunningly presenting the history of The Three Doors Problem from my rather spe-cial point of view, I have engineered complex reality so as to convert the Three DoorsProblem into an illustration of my personal Philosophy of Science, my Philosophyof Statistics.

This means that I have re-engineered the Three Doors Problem into an ex-ample of the point of view that Applied Statisticians should always be wary of thelure of Solution-driven Science. Applied Statisticians are trained to know Applied

1Professor Gill has been selected as the 2010—2011 Distinguished Lorentz Fellow by the Nether-lands Institute for Advanced Study in Humanities and Social Sciences. He is a member of theRoyal Netherlands Academy of Arts and Sciences.

1

1

THE FRAILTY MODEL

Paul Janssen

President of the Belgian Statistical Society (2008—2010)

Professor, Center for StatisticsHasselt University, Belgium

Luc DuchateauProfessor and Head, Department of Physiology and Biometry

Ghent University, Belgium

Survival data are often clustered; it follows that the independence assumptionbetween event times does not hold. Such survival data occur, for instance, incancer clinical trials, where patients share the same hospital environment. Theshared frailty model can take such clustering in the data into account and providesinformation on the within cluster dependence. In such a model, the frailty is ameasure for the relative risk shared by all observations in the same cluster. Themodel, a conditional hazard model, is given by

hij(t) = h0(t)ui exp(xt

ijβ)

= h0(t) exp(xt

ijβ + wi

)

where hij(t) is the conditional (on ui or wi) hazard function for the jth observation(j = 1, . . . , ni) in the ith cluster (i = 1, . . . , s): h0(t) is the baseline hazard, β is thefixed effects vector of dimension p, xij is the vector of covariates and wi (ui) is therandom effect (frailty) for the ith cluster. The wi’s (ui’s) are the actual values ofa sample from a density fW (.) (fU (.)). Clustered survival data will be denoted bythe the observed (event or censoring) times y = (y11, . . . , ysns

)t and the censoringindicators (δ11, . . . , δsns

)t. Textbooks references dealing with shared frailty modelsinclude Hougaard (2000) and Duchateau and Janssen (2008).

The one-parameter gamma density function fU (u) = u1/θ−1 exp(−u/θ)θ1/θΓ(1/θ)

(withmean one and variance θ) is often used as frailty density as it simplifies modelfitting, especially if a parametric baseline hazard (parameterised by ξ) is assumed.The marginal likelihood for the ith cluster of the gamma frailty model can easily beobtained by first writing the conditional likelihood for the ith cluster and by thenintegrating out the gamma distributed frailty. With ζ = (ξ, θ, β), we have

LIKELIHOOD

Nancy Reid 1

Canada Research Chair in Statistical Theory and ApplicationsProfessor, Department of Statistics, University of Toronto, Canada

1 Introduction

The likelihood function in a statistical model is proportional to the densityfunction for the random variable to be observed in the model. Most oftenin applications of likelihood we have a parametric model f(y; θ), where theparameter θ is assumed to take values in a subset of R

k, and the variable y isassumed to take values in a subset of R

n: the likelihood function is defined by

L(θ) = L(θ; y) = cf(y; θ), (1)

where c can depend on y but not on θ. In more general settings where the modelis semi-parametric or non-parametric the explicit definition is more difficult,because the density needs to be defined relative to a dominating measure, whichmay not exist: see Van der Vaart (1996) and Murphy and Van der Vaart (1997).This article will consider only finite-dimensional parametric models.

Within the context of the given parametric model, the likelihood functionmeasures the relative plausibility of various values of θ, for a given observed datapoint y. Values of the likelihood function are only meaningful relative to eachother, and for this reason are sometimes standardized by the maximum valueof the likelihood function, although other reference points might be of interestdepending on the context.

If our model is f(y; θ) =(ny

)θy(1− θ)n−y, y = 0, 1, . . . , n; θ ∈ [0, 1], then the

likelihood function is (any function proportional to)

L(θ; y) = θy(1 − θ)n−y

and can be plotted as a function of θ for any fixed value of y. The likelihoodfunction is maximized at θ = y/n. This model might be appropriate for asampling scheme which recorded the number of successes among n independenttrials that result in success or failure, each trial having the same probabilityof success, θ. Another example is the likelihood function for the mean andvariance parameters when sampling from a normal distribution with mean μand variance σ2:

L(θ; y) = exp{−n log σ − (1/2σ2)Σ(yi − μ)2},1Professor Reid is a Past President of of the Statistical Society of Canada (2004-2005).

During (1996—1997) she served as the President of the Institute of Mathematical Statistics.Among many awards, she received the Emanuel and Carol Parzen Prize for Statistical In-novation (2008) “for leadership in statistical science, for outstanding research in theoreticalstatistics and highly accurate inference from the likelihood function, and for influential con-tributions to statistical methods in biology, environmental science, high energy physics, andcomplex social surveys.” She was awarded the Gold Medal, Statistical Society of Canada(2009) and Florence Nightingale David Award, Committee of Presidents of Statistical Soci-eties (2009). She is Associate Editor of Statistical Science, (2008–), Bernoulli (2007–) andMetrika (2008–).

1

GRAPHICAL ANALYSIS OF VARIANCE

G.E.P. Box1

Professor Emeritus, University of Wisconsin, Madison, USA

Walter Shewhart said:”Original data should be presented in a way that would preserve the evidence

in the original data”, (1939 page 88).

Frank Anscombe said:”A computer should make both calculation and graphs. Both kinds of output

contribute to understanding”, (1973 page 17).

And Yogi Berra said:”You can see a lot by just looking”.

1. A SIMPLE COMPARATIVE EXPERIMENT

As an illustration of graphical analysis of variance, table 1a shows coagulationtimes for samples of blood drawn from 24 animals randomly allocated to fourdifferent diets A, B, C, D. Tale 1b shows an analysis of variance for the data.

Table 1a. Coagulation times for blood drawn from 24 animals randomlyallocated to four diets

Diets (Treatments)

A B C D

62(20) 63(12) 68(16) 56(23)

60(2) 67(9) 66(7) 62(3)

63(11) 71(15) 71(1) 60(6)

59(10) 64(14) 67(17) 61(18)

63(5) 65(4) 68(13) 63(22)

59(24) 66(8) 68(21) 64(19)

Treatment averages 61 66 68 61

Grand average 64 64 64 64

Difference −3 +2 +4 −3

1President of the American Statistical Association in 1978, President of the Institute of Mathe-matical Statistics in 1979. Professor Box received the British Empire Medal in 1946, the ShewhartMedal from the American Society for Quality Control in 1968, the Wilks Memorial Award fromthe American Statistical Association in 1972, the R. A. Fisher Lectureship in 1974, and the GuyMedal in Gold from the Royal Statistical Society in 1993. He is very well-known for his work inExperimental Designs, Time Series, and Regression with his name on many important techniquesincluding Box-Cox power transformation, Box-Jenkins time series models, Box-Muller transfor-mation and Box-Behnken designs. He has also authored many well-known texts in time seriesand stochastic control, Bayesian statistics and experimental design and with Norman Draper heinvented the concept of evolutionary operation (book pulished in 1969 by Wiley.)

1

PORTFOLIO THEORY

Harry M. Markowitz 1

Winner of the Nobel Memorial Prize in Economic Sciences in 1990

Professor, Rady School of ManagementUniversity of California, San Diego, USA

Portfolio Theory considers the trade-off between some measure of risk andsome measure of return on the portfolio-as-a-whole. The measures used most fre-quently in practice are expected (or mean) return and variance or, equivalently,standard deviation. This article discusses the justification for the use of mean andvariance, sources of data needed in a mean-variance analysis, how mean-variancetradeoff curves are computed, and semi-variance as an alternative to variance.

1. MEAN-VARIANCE ANALYSIS AND ITS JUSTIFICATION

While the idea of trade-off curves goes back at least to Pareto, the notion ofa trade-off curve between risk and return (later dubbed the efficient frontier) wasintroduced in Markowitz (1952). Markowitz proposed expected return and varianceas both a hypothesis about how investors act and as a rule for guiding action infact. By Markowitz (1959) he had given up the notion of mean and variance as ahypothesis but continued to propose them as criteria for action.

Tobin (1958) said that the use of mean and variance as criteria assumed ei-ther a quadratic utility function or a Gaussian probability distribution. This viewis sometimes ascribed to Markowitz, but he never justified the use of mean andvariance in this way. His views evolved considerably from Markowitz (1952) toMarkowitz (1959). Concerning these matters Markowitz (1952) should be ignored.Markowitz (1959) accepts the views of Von Neumann and Morgenstern (1944) whenprobability distributions are known, and Leonard J. Savage (1954) when probabili-ties are not known. The former asserts that one should maximize expected utility;the latter asserts that when probabilities are not known one should maximize ex-pected utility using probability beliefs when objective probabilities are not known.

1Professor Markowitz has applied computer and mathematical techniques to various practicaldecision making areas. In finance: in an article in 1952 and a book in 1959 he presented whatis now referred to as MPT, “modern portfolio theory.” This has become a standard topic incollege courses and texts on investments, and is widely used by institutional investors for assetallocation, risk control and attribution analysis. In other areas: Dr. Markowitz developed “sparsematrix” techniques for solving very large mathematical optimization problems. These techniquesare now standard in production software for optimization programs. Dr. Markowitz also designedand supervised the development of the SIMSCRIPT programming language. SIMSCRIPT hasbeen widely used for programming computer simulations of systems like factories, transportationsystems and communication networks. In 1989 Dr. Markowitz received The John von NeumannAward from the Operations Research Society of America for his work in portfolio theory, sparsematrix techniques and SIMSCRIPT. In 1990 he shared The Nobel Prize in Economics for his workon portfolio theory.

1

STATISTICS ROLE IN ENVIRONMENTALMONITORING

Jennifer Brown 1

President of the New Zealand Statistics Association

Head, Department of Mathematics and StatisticsUniversity of Canterbury, Christchurch, New Zealand

Environmental monitoring is conducted to provide information on status,and changes in status, of an environmental system. Often monitoring is associatedwith an impact, such as a proposed land development activity or rehabilitationof a habitat. At other times, environmental monitoring is conducted to assessthe success (or failure) of a new environment management strategy or change instrategy. Monitoring can also be carried out to provide information on the overallstatus of a land or water area of special interest in fields such as biodiversity, thewelfare of an endangered species or the abundance of a pest species.

In all these examples, the common theme is that monitoring is conducted toprovide information. This information may be used in reports and articles thatare created to bring about change in management. Typically, such informationis numerical – a summary statistic, or a set of data – or some type of numericalmeasure. This is the role of statistics – statistics is the process used to collect andsummarise data to provide relevant information for environmental monitoring.

Some underlying principles apply to information collected from environmentalmonitoring. The first is that in any monitoring design, the aims and objectives needto be clearly stated, both in a temporal and a spatial scale. The most successfulmonitoring programmes have aims and objectives that can be quantified to guidedevelopment of the survey design and data analysis (Gilbert 1987).

The survey design for the monitoring programme should specify how infor-mation is to be collected. Various survey designs can be used, and the importantcriterion is that they should provide a sample that is representative of the popu-lation and provide information relevant to the survey objective. The populationcan be considered to be an area of land or water that has fixed and delineatedboundaries. A reserve or national park is an example of such a population. Otherpopulations may be a species of interest, e.g. a bird population. In the exampleof a bird species, the population may be finite, although of unknown size, but thespatial boundaries may be unknown if the birds are highly mobile. In other applica-tions, defining the population may be very difficult. For example, when monitoringthe impact of a new industrial development in a rural area, delineating the areabeyond which there is unlikely to be an effect may be very difficult.

Sample designs that use an element of probability (probability sampling)

1Professor Brown is currently an Associate Editor of Journal of Agricultural, Biological andEnvironmental Statistics, of the Australian and New Zealand Journal of Statistics, and of theInternational Journal of Ecological Economics & Statistics

1