23
3 Review papers Graphical displays Leland Wilkinson SYSTAT, INC. and Department of Statistics Northwestern University This paper selectively reviews the field of scientific and medical graphics. Examples are provided using world health statistics and several new methods of presenting multiveriate data are introduced. . 1 Introduction Figure 1 shows one of the results of a survey on the use of graphics in scientific articles by Cleveland. The horizontal position of each dot represents the proportion of total page area devoted to graphs in 50 articles sampled from the 1980-81 volumes of each journal. Cleveland’s analysis of these data showed that the differences among journals were due more to the number of graphs appearing in the articles than to the size of the graphs. Figure 1 Proportion of article area devoted to graphs in selected scientific journals Address for correspondence: Leland Wilkinson, Adjunct Professor of Statistics, Department of Statistics, Northwestern University, Evanston IL 6020, USA; President, SYSTAT, Inc., 1800 Sherman Ave., Evanston IL, USA. at UNIVERSITY OF WATERLOO on December 16, 2014 smm.sagepub.com Downloaded from

Review papers : Graphical displays

  • Upload
    l

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Review papers : Graphical displays

3

Review papers

Graphical displaysLeland Wilkinson SYSTAT, INC. and Department of Statistics Northwestern University

This paper selectively reviews the field of scientific and medical graphics. Examples are provided usingworld health statistics and several new methods of presenting multiveriate data are introduced.

. .

1 Introduction

Figure 1 shows one of the results of a survey on the use of graphics in scientific articlesby Cleveland. The horizontal position of each dot represents the proportion of totalpage area devoted to graphs in 50 articles sampled from the 1980-81 volumes of eachjournal. Cleveland’s analysis of these data showed that the differences among journalswere due more to the number of graphs appearing in the articles than to the size of thegraphs.

Figure 1 Proportion of article area devoted to graphs in selected scientific journals

Address for correspondence: Leland Wilkinson, Adjunct Professor of Statistics, Department of Statistics,Northwestern University, Evanston IL 6020, USA; President, SYSTAT, Inc., 1800 Sherman Ave.,Evanston IL, USA.

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 2: Review papers : Graphical displays

4

Perhaps some scientific disciplines require clear writing and few graphs. Perhapsgraphs cannot express the complexity of theories in some fields. Perhaps graphs arefuzzy and numbers precise. If so, Cleveland’s findings are anomalous, because someof the most quantitative and mature journals in his sample are the most graphicallyoriented.

Medical and biological journals fall somewhat above average in Cleveland’s survey,so a review article in this journal urging the use of graphs would be preaching to theconverted. Many reviews or critical papers in the statistical graphics field have alreadybeen published.2-8 And several texts are available.9-19 Consequently, this review willfocus on selected areas of statistical graphics relevant to medical research. The majorsections will be devoted to one-variable, two-variable, and many-variable graphs.Two major areas will not be covered. The first is the actively developing research

field of dynamic graphics.2a-31 Although these procedures have proven valuable forexploratory data analysis, they will not be covered in this review because of our concernwith presentation graphics. Nevertheless, developments in this field will profoundlyaffect the appearance of multivariate graphs in the future.The second area involves graphical perception and the research on effective graphic

design. This is a large field shared by psychologists and statisticians. Psychologists havedeveloped theories and experiments on the perception of various graphic elements andwhole graphs.32-42 Several statisticians have worked independently on these problemsand have provided guidelines for effective displays based on their research.43-49The point of view throughout this review is that effective presentation statistical

graphics should include the raw data whenever possible. For one-, two-, and evenmultivariable displays, it is usually possible to overlay structural summaries on the rawdata. There are exceptions, of course, but computer graphics now make it possible toenhance raw data displays so that a balance is achieved between summary and finedetail. For journal articles and other archival displays, this is especially important,since a graph may be the only access readers have to the original data. This pointof view also excludes many common but ineffective applications of popular graphicaldisplays such as histograms, bar charts, standard error bars, and ANOVA interactionplots. More suitable alternatives will be offered.

2 One-variable graphs

Figure 2 shows the most common single variable display: the histogram. The data arelife expectancies in 17 countries for males and females, compiled by the World HealthOrganization. The histogram has been used ordinarily for two purposes: counting anddensity display. It is effective for neither.For counting, there are better displays. Figure 3 is one: a simple tally. It resembles

the kind of sideways histogram produced on mainframe computers by older statisticalpackages with stacks of ’*’ or ’X’ or ’0’ for the bars. This display is different, however,because the size of the circles is infinitely adjustable. The size of the circles was chosenhere to provide the same resolution as the histogram in Figure 2.A stem and leaf diagram is a tally which provides more information. Figure 4 shows

one for the same data. Due to Tukey,50 this display splits the data values into twoparts: a stem (printed on the left) and a leaf (on the right). In between the two areprinted the ’hinges’ of the data, which correspond roughly to the upper and lower

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 3: Review papers : Graphical displays

5

quartiles. The median is also normally printed, but is covered by the upper hinge herebecause of the extreme negative skewness.The stem and leaf diagram allows us to inspect the ’graininess’ of the data by

revealing the leftmost two digits. In displaying the data values, it provides the sameshape information that the tally does. For small samples, the dual role makes the stem

Figure 2 Histogram of life expectancy data.

Figure 3 Dot plot (tally) of life expectancy data

Figure 4 Stern and leaf diagram of life expectancydata Figure 5 Fuzzygram of life expectancy data

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 4: Review papers : Graphical displays

6

and leaf diagram preferable to the tally shown in Figure 3.As a density display, the histogram is rather poor. The main problem is the choice

of the number of bars. Many introductory statistics books declare this to be a fixednumber, say 15. In fact, an optimal number usually involves the size of the sampleand the shape of the distribution. Sturges,51 1 Doane,52 Scotts3 and Diaconis andFreedman54 offer methods for choosing the number of bars or bar widths. Severalstatistical packages are programmed to make this choice automatically.Another problem is the intrinsic discreteness of the histogram versus the continuity

of the underlying distribution which must be displayed. Wilkinsonss attempted toremedy this problem by superimposing an asymptotic cumulative normal distributionon the histogram bars. Figure 5 shows this ’Fuzzygram’ for the life expectancy data.The display is fuzzier for small samples than for large and converges on the populationdistribution as the sample size increases. Cognitively, it is an attempt to suppress

Figure 6 Density polygon of life expectancy data

Figure 7 Kernel smooth of life expectancy data

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 5: Review papers : Graphical displays

7

counting and to provide instead visual confidence limits for thinking about possiblepopulation distributions.

There are numerous ways to profile a density. Technically, a continuous densitycannot be estimated at a point, but the idea of smoothing is appealing because densitiesare more easily interpreted than distributions. The simplest method is to connect thecentres of histogram bar tops with a single line. Figure 6 shows a simple polygondisplay on the life expectancy data.

Kernel density estimates go one step further. For n observations, they integrate theprobability mass I/n at each point over a smooth curve. This is done by using akernel function which works like the moving windows used in time series smoothers.Tarter and Kronma1,56 Silverman,57 and Wegmans8 summarize these methods. Thereare problems associated with the choice of smoothing kernel which are analogous tothe choice of a bar width for histograms (though less sensitive), but optimal choiceshave been identified for most practical problems. Wand et al.59 discuss ways to kernelsmooth mixtures of distributions to refine this choice even further. Finally, SCott60provides computational algorithms for smoothing histograms which are several timesfaster than kernel methods and can be used effectively on high dimensional displaysand large datasets. Figure 7 shows a kernel smooth of the life expectancy data. It is

superimposed on the histogram.Histograms and smoothed densities are not effective for displaying location and

spread. The schematic plot, or box plot50 is useful for this purpose. Figure 8 showsa box plot of the life expectancy data. The centre line in the box marks the medianand the edges of the box the hinges, which are approximately the lower and upperquartiles. Many variations on the original box plot have been introduced, most basedon sample quantiles rather than the ’letter values’ originally defined by Tukey. In asurvey of major statistical packages, Frigge, Hoaglin, and Iglewicz61 found that onlyMinitab, SYSTAT, and Data Desk produce the median, box size, and whisker lengthspecified by Tukey.A drawback of the box plot is that it can conceal sample size and multimodality.

The symmetric dot plot can be combined with Tukey’s box plot to reveal local samplefeatures. Figure 9 shows an example using the life expectancy data. If the dots are

kept hollow and the lines thin, they do not interfere with the perception of the boxplot itself.Many methods have been proposed for comparing distributions to each other or to

a theoretical distribution. A simple method is to plot the fractiles of each distributionagainst each other. If the distributions are identical, the points in the plot will lie ona line. This ’quantile’ or ’probability’ plot is often recommended because it does notdepend on a choice of class interval and its sensitivity depends on deviations fromlinearity, which is easy to detect visually.

Figure 8 Box plot of life expectancy data Figure 9 Dot box plot of life expectancy data .

° ’ ’ ’

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 6: Review papers : Graphical displays

8

Figure 10 illustrates probability plots for a variety of distributions. Each distributionwas randomly sampled with n=50. The top of the figure shows the density anddistribution functions for each distribution. Directly below are the kernel densitiesfor the samples. The next row shows the sample cumulative distribution functions.Next are the normal probability plots. The last row shows the probability plot for thecorrect distribution. Each plot in this last row should tend toward a straight line. Theexamples in the next to the last row show how typical non-normal data should look innormal probability plots.

Figure 11 shows a probability plot for our life expectancy data against a theoreticalnormal distribution. For comparison purposes, the ordinary and cumulative histogramshave been included inside the plot with superimposed normal curves based on thesample mean and standard deviation. The diagonal straight line is fitted throughthe sample fractiles corresponding to a perfect normal. Somewhat surprisingly, thedeviation from linearity in the probability plot is not as apparent as the deviationsfrom the normal density in the histogram. Despite its theoretical shortcomings, thehistogram appears to perform somewhat better on these data than the probabilityplot or empirical cumulative distribution. Even though the probability plot linearizesthings, it appears that densities can be easier to understand than cumulatives.We are left with the question of how different from linear a probability plot should

be before we become alarmed. One of the most powerful tests for non-normality62

Figure 10 Probability plots for a variety of distributions

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 7: Review papers : Graphical displays

9

is based on the concept of the normal probability plot, so we could use the plotto supplement the Shapiro-Wilk test. On these data, the test statistic happens tobe significant at p=0.014 (W=0.92), although the Shapiro-Wilk test is known to besomewhat liberal in rejecting the null hypothesis of normality.Another simple method is to superimpose random samples from the desired prob-

ability distribution on the probability plot. Figure 12 does this for the normal plot.Twenty normal random samples are plotted on the same scale in dotted lines and thesample data in solid. Each random sample is based on the life expectancy mean andstandard deviation. Smooth confidence limits based on Lilliefors’ test for normality canalso be added to the normal probability plot.63

Figure 11 I Normal probability plot of life expectancydata

Figure 12 Normal probability plot of life expectancydata plus 20 normal random samples

3 Two-variable graphs

Two-way plots can include various combinations of continuous and categorical vari-ations. One extension of the single variable graphs we have seen is to repeat themacross a grouping variable to display subsamples or subpopulations. Figure 13 showsan example for the life expectancy data. The dot box plots have been plotted separatelyfor males and females, revealing the heterogeneity in the data. This display works wellfor more than two groups and is especially suited for graphing analysis of variance data.ANOVA data are frequently displayed with factorial plots (interaction plots). While

these are intrinsically multivariate, it is easier to discuss them here among two variablegraphs. Figure 14 shows two graphs commonly used to display factorial data: the barchart with error bars and the line graph with error bars. Neither is useful. Both concealthe data. Furthermore, by featuring symmetric standard deviations or standard errors,these graphs disguise potential skewness.

Since the graphs in Figure 14 are intended to reveal mean differences and interac-tions, these features can easily be added to a display which does not conceal the data.Figure 15 is an example. This dual dot plot has been used to show all data values and

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 8: Review papers : Graphical displays

10

Figure 13 Dot box plot of life expectancy data groupedby gender

Figure 14a Bar graph for results of typical factorial

experiment

Figure 15 Dual dot plot for results of typical factorialexperiment

Figure 14b Line graph for results of typical factorialexperiment

the lines connect the means of the four groups. There is no need for error bars because

they are proportional to the spread of the data values. Text accompanying the graphscan summarize significant differences. Examples of similar graphs have appeared inthe statistical and medical literature64>65 and the ’population pyramid’ graph showingback-to-back histograms is almost a century old. Sometimes the groups are blendedinto single distributions with filled and open circles identifying the groups. This tendsto conceal the shape of the separate distributions, however.The most common two dimensional continuous variable data display is the

scatterplot. Figure 16 is an example. Here we plot the male life expectancy valuesagainst the female. The correlation is strong, but the negative skewness on bothvariables results in most observations bunching in the upper right corner.There are many ways to enhance a scatterplot. The life expectancy data appear to

show a linear relationship between males and females, but the skewness is not well

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 9: Review papers : Graphical displays

11

highlighted. Figure 17 shows how a bivariate density superimposed on the scatterplotreveals the joint and marginal skewness. The kernel smoother is a nice diagnostic devicefor checking assumptions behind bivariate linear modelling.Figure 18 shows another example of how a bivariate kernel can highlight subtle

structure in a scatterplot. These data are birth and death rates per year per 100 000people for 75 selected countries. The bivariate kernel contours are superimposed toshow the joint sample distribution and selected points are labelled. The zero populationgrowth line at the left of the plot discriminates countries like Hungary, which are losingpopulation, from countries like Guatemala, which are gaining rapidly.

Figure 16 Scatterplot of male against female lifeexpectancy

Figure 17 Scatterplot enhanced by bivariate kerneldensity

Figure 18 Bivariate kernel density of birth and death rates

This graph reveals a disquieting nonlinearity and bimodality in world health stat-istics. Developed nations show varying birth rates but relatively low death rates.

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 10: Review papers : Graphical displays

12

Underdeveloped nations have extremely high birth rates and high death rates. Some-times a graph tells everything one needs to know about the data. Some graphs eludeparsimonious mathematical modelling. This is an example.

Nonlinear smoothers are another powerful diagnostic and exploratory tool. There aremany methods and numerous references, including surveys.66-71 Superimposed on two-dimensional scatterplots, non-linear smoothers reveal, not surprisingly, nonlinearity.Figure 19 shows a LOWESS smoother72 through the world health data. Examining agraph with a robust smoother is a good antidote to thoughtless fitting of straight lineswith least squares.

Figure 19 LOWESS smooth of birth and death rates

4 Multivariable graphs

Multivariable graphs can contain continuous and categorical variables. We have alreadyexamined categorical ANOVA displays, so now we will focus on continuous variables.For joint continuous variables, the scatterplot matrix (SPLOM) is the simplest generaldisplay. The SPLOM is simply an array of scatterplots.14,~3 Rectangular SPLOMsshow plots of one set of variables (rows) against another (columns). Square or triangularSPLOMs show a set plotted against itself.

Figure 20 is an example which includes the birth and death rate data plus threeadditional variables involving annual educational, health, and military spending percapita in adjusted US dollars. On the diagonal are the marginal histograms for eachvariable. This type of SPLOM reveals joint and marginal distributions. Notice that thethree social expenditure variables are all positively skewed.

Figure 21 shows the same SPLOM after logging the social expenditure variables.The heteroscedasticity and nonlinearity in many of the cells appears to be cured bythis transformation, although the substantial nonlinearity between the birth and deathrate variables remains because we did not transform them.Now let’s examine various methods for highlighting structure in SPLOMs. With

many variables and tiny panels, SPLOMs can be difficult to read. We will useseveral of the enhancement methods for single scatterplots on SPLOMs to see their

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 11: Review papers : Graphical displays

13

Figure 20 Scatterplot matrix of birth, death, and socialexpenditure variables

Figure 21 Scatterplot matrix of birth, death, and loggedsocial expenditure variables

effectiveness. Figure 22 shows the bivariate kernel smoother on these data. Notice thebimodality in most of the plots (and even trimodality in the DEATH-HEALTH plot).These data do not appear to be bivariate normal even after the log transformations.

Figure 23 shows the LOWESS smoother on the same SPLOM. Substantial

nonlinearity, particularly involving the birth and death rate variables, is highlighted.While it might be useful to use parametric or nonparametric tests of significance fornonlinearity on individual plots, it can be more useful to observe overall structurewithout focusing on one plot in isolation.

Since numerical scales are lacking (or would be too small to read) scale information isimpossible to perceive in a SPLOM. One remedy is to connect the points in each panelwith a minimum spanning tree. This tree joins all points in a plot such that the lengthof the tree branches (in Euclidean distance) is as short as possible. Figure 24 shows thistree superimposed on the scatterplot matrix.The trees reveal two blocks of variables measured on similar scales: (BIRTH,

DEATH) and (EDUCATE, HEALTH, MILITARY). Within blocks, the tree

branches head in all directions. Between blocks (lower left corner) they are mostlyparallel to the (EDUCATE, HEALTH, MILITARY) axes and are much longer in thatdirection. This means that the scales for BIRTH and DEATH are longer than those forEDUCATE, HEALTH, and MILITARY.

Finally, Figure 25 shows an influence SPLOM. In this plot, the size of each pointis proportional to the contribution of each point to the Pearson correlation in its

respective panel. Negative contributions are represented by filled symbols and positiveby hollow. Large points are thus outliers from a bivariate normal perspective.There are numerous other ways to represent three or more of these variable

graphically to reveal joint structure. A simple generalization of the two variablesplots is to use size, colour, or shading to represent additional variables. Figure26 is a ’bubble plot’. The data are the world health statistics and the size of the

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 12: Review papers : Graphical displays

14

circles is proportional to the log of annual per capita health expenditures. Overall,per capita health expenditures are largest in the developed nations, but there areexceptions (e.g. Gabon and Venezuela). Gabon is more notable because its relativelyhigh health expenditure is accompanied by a relatively high death rate. (The labels havebeen omitted, but can be located in Figure 18.) ..

Sometimes smoothing can be compounded. Figure 27 shows three separate contours

Figure 22 Scatterplot matrix with bivariate kerneldensities

Figure 23 Scatterplot matrix with LOWESS smooths

Figure24 Scatterplot matrixwith minimum spanningtree

Figure 25 Scatterplot matrix with correlationmfluences

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 13: Review papers : Graphical displays

15

superimposed on the BIRTH-DEATH plot featured in Figure 18. The BIRTH-DEATH kernel smooths are retained to highlight the data concentration. The shadingvariables are, respectively, the logs of EDUCATE, HEALTH, and MILITARY expen-ditures. The most interesting of these plots is the concentration of military spending ina small cluster of underdeveloped countries (e.g. Libya). Expenditures, of course, areconcentrated in the developed nations as well, but military expenditures in relativelyunstable countries (in terms of birth and death rates) are troubling.While these smooths are aesthetically appealing, they do not offer as much infor-

mation as the bubble plot in Figure 26. Substituting a smooth for raw data is onlyuseful when there are no outliers from the smooth. Even less useful are the popular3-D mesh plots. Figure 28 shows the surface which produced the health expenditurescontours in Figure 27. Spikes to the surface indicate the location of the data values.It is difficult to interpolate or extrapolate data values from this plot. In general, 3-Dplots should be used only to augment information perceivable in contours, perhapsas small accompanying surfaces alongside a contour plot. As Becker and Cleveland74point out, making realistic ’scenes’ out of abstract data does not always contribute tounderstanding.

Figure 26 Bubble plot of health expenditures on birth and death rates

Sometimes the most effective display for n observations on p variables is to plot thedata matrix itself. For this display to be effective, the data must be scaled to becomparable across rows and columns, the rows and columns must be permuted toreveal potential structure, and symbols, shading, or colour must be chosen to highlightthe variation in data values. When rows and columns are independent, this display isless useful, but elsewhere the direct display of the data matrix can reveal patterns whichare obscured by other multivariate statistical and graphical summaries.There are many methods for permuting a matrix to a simple structure. Vari-

ous methods are optimized to different potential structures. Hartigan~s discussedpermuting matrices to block structures. Wilkinson76 focused on seriation (one dimen-sional) structures.

After standardizing and permuting, the matrix can be displayed by printing thenumbers in single digit cells or by using shading or colour to display the data values

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 14: Review papers : Graphical displays

16

Figure 27 Contour plots of social expenditures against birth and death rates _

as ’pixels’. 77 Missing data can be displayed with special characters or blanks. The resultlooks like a contouring of the data matrix itself.

Figure 29 shows a direct display of the world health data. First, the five variables(EDUCATE, HEALTH, MILITARY, BIRTH, DEATH) were standardized to valuesbetween 0 and 1 using minima and maxima within columns. This range was assignedto a grey scale (0=white to 1=black). Then the columns were ordered in expenditureand health blocks and the rows were sorted on EDUCATE, which is the first variableand which correlates highly with the other expenditure variables and, implicitly, withGNP per capita itself. A simple, one dimensional structure is visible. In the developedcountries at the top of the matrix, expenditures are high and birth and death rates low.At the bottom of the matrix, expenditures are low and birth and death rates high.Other features are salient. The Scandinavian countries, Canada and Switzerland,

invest heavily in education and health. The USA and Libya invest the most per capitain their military forces. Gabon, Yemen, and Mauritania have high death rates despitemoderate to high per-capita investments in medical care.

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 15: Review papers : Graphical displays

17

Figure 28 Three dimensional plot of social expenditure variables

To the left of the display are symbols assigned from the results of a k-means clusteranalysis on the standardized variables Third World nations are represented by atriangle, developing nations by a square, and developed by a circle. The ’outliers’(Libya, Gabon, Congo, Iraq, and Yemen) are represented by a star. Notice that thesymbols follow, for the most part, the order of the rows.When there are not too many cases or variables, one can display the profiles of all

variables for every case. Figure 30 shows profiles on EDUCATE, HEALTH, MILI-TARY, BIRTH and DEATH variables for every case (with the latter three variableslogged and all variables standardized to produce z scores). There are numerous otherICON displays, where the variation on each case is represented by features of aminiature icon - faces, profiles, stars, etc.78,79 In this display, the icon is a smallbar chart. The cases are sorted in the array by EDUCATE, as in Figure 29. Theshading of the displays was governed by the four-group k-means cluster analysis of thestandardized variables. The white icons represent primarily third world nations. Thestriped ones are developing nations. The grey ones are developed. And the black onesare mainly outliers, with relatively high birth and death rates as well as high militaryspending.Most icon displays are not well suited for examining values of individual variables.

They are more likely to be useful for detecting clusters of similar profiles. The bar icon,because it so plainly reveals the profiles, is perhaps the easiest to decode. For any icondisplay, one or two dimensional sorting on a principal component or other variable(s)greatly improves interpretation.

Although icons are popular in the visualization literature and papers involving themare numerous, it is hard to see how they offer an improvement on Figure 29. In rareinstances, they can be used effectively as plotting symbols in a scatterplot to add a fewdimensions.

Another alternative to icons is a parallel coordinate plot.75,80581 This graph sub-stitutes lines for bars in Figure 30 and overlaps all the cases on a common scale.

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 16: Review papers : Graphical displays

18

Figure 29 Permuted matrix display of standardized variables (E=Education, H=Health, M= Military. B=Birth,D=Death)

Subgroups often must be plotted separately, as in Figure 31, in order to avoidclutter. If colour is available, subgroups can be assigned contrasting colours in asingle plot. Figure 31 shows four parallel coordinate plots stratified by the k-meansclusters. The scales are for the z scores produced by standardizing each variable. Thisis an excellent way to portray the output of k-means and other case clustering methodsbecause distributions on and across the variables are immediately visible.Andrewsg2 developed a close relative of the parallel co-ordinate plot. Instead of

plotting raw values, he assigned each variable to a different frequency of sine and cosinewaveforms. Adding these waveforms together produces a trigonometric ’profile’ foreach case. As in a parallel co-ordinate plot of standardized scores, observations whichare adjacent in variables space have profiles which are close together and similarlyshaped. Unlike the parallel co-ordinate plot, Andrews’ curves cannot easily be decodedto reveal original data values. Although the shape of the Andrews’ curves depends on

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 17: Review papers : Graphical displays

19

Figure 30 Bar icons of cases on education, health, military, birth and death

the order of assigning variables to frequencies, they have proved effective in uncoveringcluster structure. Everittll discusses these displays further. Figure 32 contains Fourierplots for each of the nations clusters.A ’biplot’ represents cases and variables in the same graph.83 It contains princi-

pal components scores as well as the vectors of the components. The advantage ofdisplaying both is that the vectors help define the directions in the plot as well as therelations among the variables, and the scores show a two dimensional view of the over-all multivariate dispersion of the cases. There are several meaningful standardizationsfor the vectors, summarized in Everitt.ll The biplot suggests other applications of theprincipal components decomposition to multivariate graphics. Once can, for example,construct a SPLOM of biplots. More generally, one can use components in a variety ofother multivariate displays as derived variables.

Figure 33 shows a biplot of the nations data using the correlation matrix of thevariables. The symbols are taken from the same k-means cluster analysis used in Figure31, with third world nations represented by a triangle, developing by a square, anddeveloped by a circle. The ’outliers’ are represented by a star. Notice that the variablescluster into two dimensions - for the health statistics and the health expenditures. Theclusters are apparent even in this simple two dimensional reduction. The simplex scaleapparent in Figure 29 can be traced approximately along a U from the upper rightcorner of the plot to the bottom middle, to the upper left.

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 18: Review papers : Graphical displays

20

Because of the way we have standardized the data within columns, we cannot makecomparisons across the spending variables to identify spending priorities within coun-tries. By standardizing within countries (rows) we can examine how countries allocatetheir resources. We could standardize this way and then display the data matrix directlyas in Figure 29, but there is another graph which is more effective. If we limit theanalysis to the three spending variables, three dimensions can be collapsed back to twowhen the variables are constrained.

Figure 31 Parallel co-ordinate plot of birth, death, and social expenditure variables

The question we are addressing involves priorities among social expenditures. Of theamount spent per capita on military, health, and education, what is the proportionalallocation within each country? A triangle plot helps answer the question. Figure 34shows this. The same countries featured in earlier plots are labelled here. Notice thatUganda, Libya, and Ethiopia spend the major share of their education-health-militarybudget portion on the military. Algeria and Venezuela and other countries at the top ofthe plot concentrate on education.

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 19: Review papers : Graphical displays

21

Figure 32 Fourier plot of birth, death, and social expenditure variables

Figure 33 Biplot of birth, death, and social expenditure variables

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 20: Review papers : Graphical displays

22

Figure 34 Triangular plot of social expenditure variables ~

.

5 Conclusion ..

Almost every graph in this article is two-dimensional. All are black and white. Andalmost every one includes the raw data. In a world of photorealistic computer anima-tion, this is not heady stuff. Computer graphics software is marketed today mostly onthe basis of 3-D, realism, and perspective features. None of these attributes is especiallyuseful for scientific presentation, however, despite the colourful examples in marketingliterature.On the other hand, few of the graphs in this article could be produced without a

computer. Even simple graphs like tallies and histograms require intensive calculationsto determine optimal parameter values for symbol sizes and bar widths for the generalcase. We must be careful not to confuse the simplicity of an image with the sophistica-tion of the calculations. In many cases, the relation between the two is complementary.In fact, from the computational point of view, the simplest operation is to assign acolour table to a variable to produce an attractive image. Most modern computeroperating systems offer this as a system call. Much more complicated is the decisionconcerning how many lines to use in a stem and leaf diagram for a given dataset.For scientific publications, we should emphasize accurate communication of infor-

mation. This emphasis most frequently calls for one or two dimensional graphs. Threedimensional graphs can be useful when understanding of intrinsic volume or surfaceis needed to integrate the information or when animation is available, as in rotationprograms. Wherever possible, however, display the smooth and the data.

AcknowledgementsThe data in this paper were adapted from a UN databank. The World Game

Institute (University City Science Center, 3508 Market Street, Philadelphia, PA 19104:Telephone (215) 387-0220 distributes a file containing these and other variables viafloppy disk.

All the graphs in this paper were produced with SYSTAT/SYGRAPH.84 A filecontaining the statements for producing them can be obtained from the author. MaryAnn Hill and Laszlo Engelman contributed valuable suggestions.

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 21: Review papers : Graphical displays

23

References

1 Cleveland WS. Graphs in scientificpublications. The American Statistician 1984;38: 26-19.

2 Beniger JR, Robyn DL. Quantitative graphicsin statistics: A brief history. The AmericanStatistician 1978; 32: 1-11.

3 Fienberg S. Graphical methods in statistics.The American Statistician 1979; 33: 165-78.

4 Snee RD, Pfeifer CG. Graphical representa-tion of data. In: Kotz S, Johnson NL eds,Encyclopedia of statistical sciences, New York:John Wiley, 1983.

5 Gabriel KR. Multivariate graphics. In: KotzS, Johnson NL eds, Encyclopedia of statisticalsciences, New York: John Wiley, 1985.

6 Wainer H, Thissen D. Graphical dataanalysis. Annual Review of Psychology 1981;32: 191-241.

7 Cleveland WS. Research in statistical

graphics. Journal of the American StatisticalAssociation 1987; 82: 419-23.

8 Tukey JW. Data-based graphics: visualdisplay in the decades to come. StatisticalScience 1990; 5: 327-39.

9 Ehrenberg ASC. Data reduction: analyzing andinterpreting statistical data. New York: JohnWiley, 1975.

10 Gnanadesikan R. Methods for statistical dataanalysis of multivariate observations. NewYork: John Wiley, 1977.

11 Everitt B. Graphical techniques for multivariatedata. London: Heinemann, 1978.

12 Bertin J. Semiology of graphics. MadisonWI: University of Wisconsin Press, 1983.Translation of Semiologie graphique. Paris:Gauthier-Villars, 1973.

13 Schmid C. Statistical graphics: design principlesand practices. New York: John Wiley, 1983.

14 Chambers J, Cleveland W, Kleiner B,T ukey P. Graphical methods for data analysis.Monterey CA: Wadsworth, 1983.

15 Tufte E. The visual display of quantitativeinformation. Cheshire CT: GraphicsPress, 1983.

16 Tufte E. Envisioning data. Cheshire CT:Graphics Press, 1990.

17 White JV. Using charts and graphs. NewYork: RR Bowker, 1984.

18 Cleveland WS. The elements of graphing data.Monterey CA: Wadsworth, 1985.

19 Velleman PF, Hoaglin DC. Applications basicsand computing of exploratory data analysis.Boston: Duxbury Press, 1981.

20 Buja S, Fowlkes EB, Keramidas EM,Kettering JR, Lee JC, Swayne DF, Tukey

PA. Discovering features of multivariate datathrough statistical graphics. In: Proceedings ofthe Section on Statistical Graphics, AlexandriaVI: American Statistical Association,1986: 98-103.

21 Donoho AW, Donoho DL, Gasko M.MACSPIN: dynamic graphical data(analysis on a desktop computer: the AppleMacintosh). In: Proceedings of the Section onStatistical Graphics, Alexandria VI: AmericanStatistical Association, 1986: 86-91.

22 McDonald JA, Pederson J. Computing envi-ronments for data analysis. I: Introduction.SIAM Journal of Scientific and StatisticalComputing 1985; 6: 1004-12.

23 McDonald JA, Pederson J. Computingenvironments for data analysis. II: Hardware.SIAM Journal of Scientific and StatisticalComputing 1985; 6: 1013-2.

24 McDonald JA, Pederson J. Computing envi-ronments for data analysis. III: ProgrammingEnvironments. SIAM Journal of Scientific andStatistical Computing 1988; 9: 380-400.

25 Stuetzle W. Plot Windows. Journal ofthe American Statistical Association 1987;82: 466-75.

26 Becker RA, Cleveland WS, Wilks AR.Dynamic graphics for data analysis. StatisticalScience 1987; 2: 355-95.

27 Cleveland WS, McGill ME. Dynamic graphicsfor statistics. Monterey CA: Wadsworth, 1987.

28 Scott DW. Statistics in motion: where isit going? In: Proceedings of the Section onStatistical Graphics, Alexandria VI: AmericanStatistical Association, 1989: 17-22.

29 Haslett J, Bradley R, Craig P, Unwin A,Wills G. Dynamic graphics for exploringspatial data with application to locating globaland local anomalies. The American Statistician1991; 45: 234-42.

30 Tierney L. LISP-Stat New York: JohnWiley, 1991.

31 Weihs C, Schmidli H. OMEGA (onlinemultivariate exploratory graphical analysis):routine searching for structure. StatisticalScience 1990; 5: 175-208.

32 Kosslyn SM. Image and mind. CambridgeMA: Harvard University Press, 1980.

33 Kosslyn SM. Graphics and human processing.Journal of the American Statistical Association1985: 80: 499-512.

34 Pinker S. A theory of graph comprehension.In: Friedle R eds, Artificial Intelligence and thefuture of testing. Norwood NJ. Ablex, 1990.

35 Simken D, Hastie R. An information-processing analysis of graph perception.Journal of the American Statistical Association

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 22: Review papers : Graphical displays

24

1987; 82: 454-65.36 Lewandowsky S, Spence I. Discriminating

strata in scatterplots. Journal of the AmericanStatistical Association 1989; 84: 682-88.

37 Spence I. Visual psychophysics of simplegraphical elements. Journal of ExperimentalPsychology: Human Performance and Perception1990; 16: 683-92.

38 Spence I, Lewandowsky S. Graphicalperception. In: Fox J, Long JS eds, Modernmethods of data analysis. Newbury Park CA:Sage Publications, 1990: 13-57.

39 Wilkinson L. An experimental evaluation ofmultivariate graphical point representations.Human Factors in Computer Systems: Proceed-ings. Gaithersburg MD, 1982: 202-9.

40 Haber R, Wilkinson L. Perceptual compo-nents of computer displays. IEEE ComputerGraphics and Applications 1982; 2: 23-35.

41 Wilkinson L, McConathy D. Memory forgraphs. In Proceedings of the Section onStatistical Graphics, Alexandria VI: AmericanStatistical Association, 1990: 25-32.

42 Hochberg J, Krantz DH. Perceptual proper-ties of statistical graphs. In: Proceedings of theSection on Statistical Graphics, Alexandria VI:American Statistical Association, 1986: 29-35.

43 Cleveland WS. A model for graphicalperception. In: Proceedings of the Section onStatistical Graphics, Alexandria VI: AmericanStatistical Association, 1990: 1-24.

44 Cleveland WS, Diaconis P, McGill R.Variables on scatterplots look more highlycorrelated when the scales are increased.Science 1982; 216: 1138-41.

45 Cleveland WS, McGill R. A color causedoptical illusion on a statistical graph. TheAmerican Statistician 1983; 37: 101-105.

46 Cleveland WS, McGill R. Graphical percep-tion : theory experimentation and applicationto the development of graphical methods.Journal of the American Statistical Association1984; 79: 531-54.

47 Cleveland WS, McGill R. The many facesof a scatterplot. Journal of the AmericanStatistical Association 1984; 79: 807-22.

48 Cleveland WS, McGill R. Graphical percep-tion and graphical methods for analyzingand presenting scientific data. Science 1985;229: 828-33.

49 Cleveland WS, McGill ME, McGill R.Theshape parameter of a two-variable graph.Journal of the American Statistical Association1988; 83 289-300.

50 Tukey JW. Exploratory data analysis. ReadingMA: Addison-Wesley, 1977.

51 Sturges HA. The choice of a class interval.

Journal of the American Statistical Association1926; 21:65.

52 Doane DP. Aesthetic frequency classifi-cations. The American Statistician 1976;30: 181-83.

53 Scott DW. Optimal and data-based histo-grams. Biometrika 1979; 66: 605-10.

54 Diaconis P, Freedman D. On the maximumdeviation between the histogram andthe underlying density. Zeitschrift fürWahrscheinlichkeitstheorie 1981; 57: 453-76.

55 Wilkinson L. Fuzzygrams. Cambridge MA:Harvard Computer Graphics Week, 1983.

56 Tarter ME, Kronmal RA. An introductionto the implementation and theory ofnonparametric density estimation. TheAmerican Statistician 1976; 30: 105-12.

57 Silverman BW. Density estimation for statisticsand data analysis. New York: Chapman &

Hall, 1986.58 Wegman EJ. Density estimation. In Kotz

S, Johnson NL eds, Encylopedia of statisticalsciences volume 2, New York: John Wiley,1982: 309-15.

59 Wand MP, Marron JS, Ruppert D. Trans-formations in density estimation. Journalof the American Statistical Association 1991;86: 343-53.

60 Scott DW. Averaged shifted histograms:effective non-parametric density desimatorsin several dimensions. The Annals of Statistics1985; 13: 1024-40.

61 Frigge M, Hoaglin DC, Iglewicz B. Someimplementations of the boxplot. The AmericanStatistician 1989; 43: 50-4.

62 Shapiro SS, Wilk MB. An analysis of variancetest for normality (complete samples).Biometrika 1965; 52: 591-611.

63 Iman R. Graphs for use with the LillieforsTest for normal and exponential distributions.The American Statistician 1982; 36: 109-12.

64 Dallal GE, Finseth K. Double dualhistograms. The American Statistician 1977;31: 39-41.

65 Krieg AF, Beck JR, Bongiovanni MB. Thedot plot: a starting point for evaluating testperformance. Journal of the American MedicalAssociation 1988; 260: 3309-12.

66 Lancaster P, Salkauskas K. Curve and surfacefitting. London: Academic Press, 1986.

67 Friedman JH. Exploratory projection pursuit.Journal of the American Statistical Association1987; 82: 249-66.

68 Friedman JH, Stuetzle W. Projection pursuitregression. Journal of the American StatisticalAssociation 1981; 76: 817-23.

69 Cleveland WS, Devlin S. Locally weighted

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from

Page 23: Review papers : Graphical displays

25

regression: an approach to regression analysisby local fitting. Journal of the AmericanStatistical Association 1988; 83: 596-640.

70 Gu C. Adaptive spline smoothing innon-gaussian regression models. Journal ofthe American Statistical Association 1990; 85:801-807.

71 Breiman L. The π method for estimatingmultivariate functions from noisy data.Technometrics 1991; 33: 125-43.

72 Cleveland WS. Robust locally weightedregression and smoothing scatterplots. Journalof the American Statistical Association 1979;74: 829-36.

73 Hartigan JA: Printer graphics for clustering.Journal of Statistical Computation andSimulation 1975; 4: 187-213.

74 Becker RA, Cleveland WS. Take a broaderview of scientific visualization. Pixel 1991;2: 42-44.

75 Hartigan JA. Clustering algorithms. New York:John Wiley, 1975.

76 Wilkinson L. Permuting a matrix to asimple structure. Proceedings of the AmericanStatistical Association, 1978.

77 Ling RF. A computer generated aid for

cluster analysis. Communications of the ACM1973; 16: 355-61.

78 Chernoff H. The use of faces to representpoints in k-dimensional space graphically.Journal of the American Statistical Association1973; 68: 361-68.

79 Freni-Titulaer LWJ, Louv WC. Comparisonsof some graphical methods for exploratorymultivariate data analysis. The AmericanStatistician 1984; 38: 184-88.

80 Wegman EJ. Hyperdimensional data analysisusing parallel coordinates. George MasonUniversity Center for Computational Statisticsand Probability Technical Report No. 1, 1986.

81 Inselberg A. Discovering multi-dimensionalstructure using parallel coordinates. In:Proceedings of the Section on StatisticalGraphics, Alexandria VI: American StatisticalAssociation, 1989: 1-16.

82 Andrews DF. Plots of high dimensional data.Biometrics 1972; 28: 125-36.

83 Gabriel KR. The biplot graphic display ofmatrices with applications to principal compo-nents analysis. Biometrika 1971; 58: 453-67.

84 Wilkinson L. SYSTAT. Evanston, IL:SYSTAT, Inc, 1990.

at UNIVERSITY OF WATERLOO on December 16, 2014smm.sagepub.comDownloaded from