15
ACT 4 453 28.42 4.69 29 28.63 4.45 15 36 21 0.22 SATV 5 453 610.66 112.31 620 617.91 103.78 200 800 600 5.28 SATQ 6 442 596.00 113.07 600 602.21 133.43 200 800 600 5.38 6 Day 2: Graphical data displays and Exploratory Data Analysis A compelling reason to use R is for its graphics capabilities. There are at least three different graphics options available, only one of which, base graphics will be discussed here. The others are lattice graphics and ggobi (based upon the grammar of graphics Wilkinson (2005). Although not all threats to inference can be detected graphically, one of the most powerful statistical tests for non-linearity and outliers is the well known but not often used “inter- occular trauma test”. A classic example of the need to examine one’s data for the effect of non-linearity and the effect of outliers is the data set of Anscombe (1973) which is included as the data(anscombe) data set. The data set is striking for it shows four patterns of results, with equal regressions and equal descriptive statistics. The graphs differ drastically in appearance for one actually has a curvilinear relationship, two have one extreme score, and one shows the expected pattern. Anscombe’s discussion of the importance of graphs is just as timely now as it was 35 years ago: Graphs can have various purposes, such as (i) to help us perceive and appreciate some broad features of the data, (ii) to let us look behind these broad features and see what else is there. Most kinds of statistical calculaton rest on assump- tions about the behavior of the data. Those assumptions may be false, and the calculations may be misleading. We ought always to try to check whether the assumptions are reasonably correct; and if they are wrong we ought to be able to perceive in what ways the are wrong. Graphs are very valuable for these purposes. (Anscombe, 1973, p 17). 6.1 The Scatter Plot Matrix (SPLOM) The problem with suggesting looking at scatter plots of the data is the number of such plots grows by the square of the number of variables. A solution is the scatter plot matrix (SPLOM) available in the pairs.panels function which is based upon the pairs func- tion. pairs.panels show the all the pairwise relationships, as well as histograms of the individual variables. Additional output includes the Pearson Product Moment Correlation Coefficient , the locally weighted polynomial regression (LOWESS), and a density curve for 26

6 Day 2: Graphical data displays and Exploratory Data Analysis · 6 Day 2: Graphical data displays and Exploratory Data Analysis A compelling reason to use R is for its graphics capabilities

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 6 Day 2: Graphical data displays and Exploratory Data Analysis · 6 Day 2: Graphical data displays and Exploratory Data Analysis A compelling reason to use R is for its graphics capabilities

ACT 4 453 28.42 4.69 29 28.63 4.45 15 36 21 0.22SATV 5 453 610.66 112.31 620 617.91 103.78 200 800 600 5.28SATQ 6 442 596.00 113.07 600 602.21 133.43 200 800 600 5.38

6 Day 2: Graphical data displays and Exploratory DataAnalysis

A compelling reason to use R is for its graphics capabilities. There are at least threedi!erent graphics options available, only one of which, base graphics will be discussed here.The others are lattice graphics and ggobi (based upon the grammar of graphics Wilkinson(2005).

Although not all threats to inference can be detected graphically, one of the most powerfulstatistical tests for non-linearity and outliers is the well known but not often used “inter-occular trauma test”. A classic example of the need to examine one’s data for the e!ect ofnon-linearity and the e!ect of outliers is the data set of Anscombe (1973) which is includedas the data(anscombe) data set. The data set is striking for it shows four patterns ofresults, with equal regressions and equal descriptive statistics. The graphs di!er drasticallyin appearance for one actually has a curvilinear relationship, two have one extreme score,and one shows the expected pattern. Anscombe’s discussion of the importance of graphsis just as timely now as it was 35 years ago:

Graphs can have various purposes, such as (i) to help us perceive and appreciatesome broad features of the data, (ii) to let us look behind these broad featuresand see what else is there. Most kinds of statistical calculaton rest on assump-tions about the behavior of the data. Those assumptions may be false, and thecalculations may be misleading. We ought always to try to check whether theassumptions are reasonably correct; and if they are wrong we ought to be ableto perceive in what ways the are wrong. Graphs are very valuable for thesepurposes. (Anscombe, 1973, p 17).

6.1 The Scatter Plot Matrix (SPLOM)

The problem with suggesting looking at scatter plots of the data is the number of suchplots grows by the square of the number of variables. A solution is the scatter plot matrix(SPLOM) available in the pairs.panels function which is based upon the pairs func-tion. pairs.panels show the all the pairwise relationships, as well as histograms of theindividual variables. Additional output includes the Pearson Product Moment CorrelationCoe!cient , the locally weighted polynomial regression (LOWESS), and a density curve for

26

Page 2: 6 Day 2: Graphical data displays and Exploratory Data Analysis · 6 Day 2: Graphical data displays and Exploratory Data Analysis A compelling reason to use R is for its graphics capabilities

each variable (Figure 5). This kind of graph is particularly useful for less than about 10variables. Students in an introductory methods course do not seem to realize that this isunusual way of plotting data.

> data(sat.act)

> pairs.panels(sat.act)

gender

0 2 4

0.09 −0.02

5 15 30

−0.04 −0.02

200 500 800

1.0

1.4

1.8

−0.17

02

4

●●●

● ●

●● ●

●● ●

●●

●●

●●

●●

● ●

●●●

●● ●

● ●●

●● ●

●●

●●

●●

●● ●●●●

●● ●

●●

● ●

● ●

● ●

● ●●

●● ●

● ●

●●●

●●

●●

●●●

●●●●

●●●

●● ●●

●●

●● ●

●●

● ●●

●●

●●

● ●

● ●

● ●●

●●

● ●

● ●

●●●

●●

●●

●●

●●

● ●

●● ●

●●●

●●●●●●●

● ●

●●

●●●

●●

●●●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●●●

●●

● ●

●●

●●●

●● ●●●●●

●●●

●●

●●●●

●●

●●

●●

● ●

●●

●●

●●●●

●●●

●●

●●

●● ●

●● ●

● ●●●

●●

●●●●

●●●

●●

● ●

●●●

●●

●●

●● ●● ●●

●●

●●

● ●

●●

●●●

●●

●●●●

●● ●●

●●● ●●● ●●●

●● ●●●●●

●●● ●●●●●● ●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●● ●●●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

●●●

education

0.55 0.15 0.05 0.03

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●● ●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●●

●● ●●

●● ●

●●●●

●●

●●●

●●

●●

●●●●●

●●

●●●

●●

●●● ●●

●●

●●●

●●

●● ●

●● ●

●●●● ●

●●●

●●

● ●

●●

●●

●●

●●●●

●●●●● ●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●●●

●●

●●●

●●●●●●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●●●●

●●

●●●●

●● ●●●●●

●●

●●● ●●● ●●●

●●

●● ●●

●●

●●●●

●●●●●

●●● ●●

●●●

●●●●

●●●●

●● ●● ●●

● ●●

●●

●●

●●

● ●●

●●●●

●●

●●

●● ●●

●● ●●●●

●●●●

●● ●●●●●●●● ●●● ●●●●

●● ●●●●● ●●●● ●●●●●● ●●●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

● ●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●● ●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●●●

●●●

●●●●

●●

●● ●

●●

●●

●●● ●●

●●

●●●

●●

● ●●● ●

●●

●●

●● ●

●● ●

●●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●● ●

●●●●

●●●●●●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●● ●●●●

●●

●●●●

●● ●●●●●

●●

●●●●● ●●●●

●●

●●●●

●●

●●

●●

●●●●●

●● ●●●

●●●

● ●●●

●●●●

●●●●●●

●●●

●●

●●

●●

● ●●

●●●●

●●

●●

●● ●●

●● ●●●●

●●

●●●

●● ●●●●● ●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

age

0.11 −0.04

2040

60

−0.03

515

30

● ●

●●

●●

●●●●●● ●

●●

●●

● ●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●● ●●

●●

●●

● ●

● ●

●●

● ●●●

●●

● ●●

●●

●●

●●

●●●

● ●

●●

● ●

●●

●●

●●●

●●●

●● ●●

●●

● ●●●●●● ●

● ●●

●●●

●●

●●

●●●● ●

● ●

●●●

●●

●●

●●●

●●

●●●● ●

●●

●●●

●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●●

●●

●●

●●●

●●● ●

●●

●●

● ●

●●●

●●

●●●●

●●

● ●

● ●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

● ●

●●●

●●

●●

●●

●●●● ●

●●●

●●●●●

● ●●

●●

●●

● ●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●

●●

● ●

● ●

● ●

●●●●

● ●

●●●

● ●●●

●●

●●

● ●

● ●

● ●

●●● ●●●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

● ●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

● ●

● ●

●●

●●

●●

●●●

●● ●

● ●● ●

●●

●● ● ●●● ●●●●

●●

●●

●●

●●● ●●

● ●

●●●

● ●

●●

●●●

●●

●●

●● ●

●●

●● ●

● ●

●●

●●

● ●

● ●●

●● ●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●●●

●●●

●●●

●●●

●●

●●

●●●

●● ●●

●●

●●

●●

●● ●

●●

●●●●

●●

●●

● ●

●●

●●

●● ●

● ●

●●

● ●

●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

● ●

●●

●●● ●

●●

●●

●●

●●

● ●

● ●

●●●

●●

●●

●●

●●●●●

●●●●

●●●●●

●●●

●●

●●

●●

●●● ●

●●

● ●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●

●●●

● ●

● ●

●●

●●● ●

●●

● ● ●

●● ●●

● ●

●●

● ●

● ●

● ●

●●● ●●● ●

●●

● ●

●● ●

● ●

●●

●●

●●

● ●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

● ● ●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●●

●● ●

● ●● ●

●●

● ●● ●●● ●●

● ●●

●●●

●●

●●

● ●● ●●

●●

●●●

● ●

●●

● ●●

●●

●●●● ●

●●

●●●

●●

●●

●●

● ●

● ●●

●● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●●

●●

● ●●●

●●●

●●●

●●●

●●

●●

●●●

●● ●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●●●●

●●●

●●

● ●

●●

● ●●●

●●

● ●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●

●●

● ●

● ●

●●

● ●● ●

● ●

● ●●

●●●●

● ●

●● ACT

0.56 0.59

●●●

●●

●●●

●●●●

● ●

●●●

●●

●●

●●

● ●

●●

●●●

● ●

●●●

●● ●

●●●

●●

●● ●

●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

● ●● ●

●●

●●

●●●●

●●

●●

●●

●●

●●

●● ●●

●●

●● ●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●● ●

●●

●●●

●●●●

●● ●

●●

●●

●●

●●

●●

●●●

●● ●

●●

●●

●●

●●●

●●

●●

●●

●● ●

●● ●

●●

●●●●

●●

●●●

●●

● ●

● ●●

●●●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●● ●

●●●

●●

●●

●●

●●●

●●

●● ●

●●●●●

●●

●●

●●

●●

● ●

●●

●●●●● ●●

●●

●●

● ●

●●●

●●

●●

● ●●●●●

●●

●●●

●●●●

● ●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●●●

●●

● ●●

●●

●●

●●

● ●●

●●● ●

●●

● ●●

●●

●●

●●

●●

● ●

●● ●

● ●

●●

●● ●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●● ●

● ●

●●

●●

●●●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●● ●

●●

●●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

● ●

●● ●

●●● ●

● ●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

● ●

●●●

●●●

●●

●●●

●●

●●

●●●

●●●

● ●

●●

●●●●

●●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●●

●●

●●

●●

●●●

●●

●●

● ●●●●●

●●

●●

●●●●

● ●

●●

●●

●●●

● ●

●●●

●●

●●●

●●●

●●

●●

●●

● ●● ●

● ●

●●●

●●

●●

●●

● ●●

●● ●●

● ●

● ●●

●●

●●

●●

●●

● ●

●●●

● ●

●●

●●●

● ●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●● ●

●●

●●

●●

●●●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●● ●

●●

●●●

●●

●●

● ●

● ●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

● ●

●● ●

●●● ●

● ●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●● ●

●●

● ●

●●●

●●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●●

●●

●●

●●

●●●

●●

●●

● ●●●●●

●●

●●

●●●●

● ●

●●

●●

●●●

● ●

●●●

●●

●●●

●● ●

●●●

●●

●●

●●● ●

● ●

●●●

●●

●●

●●

●● ●

●●●●

●●

●● ●

●●

● ●

●●

●●

● ●

●●●

● ●

●●

●●●

●●●

● ●

●●●

●● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●●●

● ●

●●

●●

●●● ●

●●

●●●●●●

●●

●●

●●

●●

●●

●●●●

● ●

●● ●

●●

●●

● ●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●●

●●

● ●●

●●●●

● ●●

● ●

● ●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

● ●

●● ●

●●●

● ●

●●

●●

●●

●●●

●●

● ●

●●●

● ●●

●●

●●●

●●

●●

●●●

●●●

● ●

●●

●●●●

●●

●●

●●

●●

●●

●●

●● ●

●●●●

●●

● ●

●●

●●

● ●

●●

●●●● ●●

● ●

●●

●●

●●●

●●

●●

● ●●●●

●●

●●●

●● ●●

●●

● ●

●●

●●

● ●

●●●

●●

●●●

●●●

●●

●●

●●

●●●●

●●

●●SATV

200

500

800

0.64

1.0 1.4 1.8

200

500

800

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●●

● ●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●● ●

●●

●● ●

●●

●●

●●

●●●

●● ●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●● ●●

●● ●

●●●

●●

●●

●●

●●

● ●●●

●●

●● ●

●●

●●

●● ●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

● ●●

●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

● ●

● ●

●●●●●

●●

● ●●●●●●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●●

●●●

●●●

●●

● ●

●●

●●●

● ●

● ●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●●

●● ●

●●

●●

●● ●

● ●

● ●●

●●

●●

● ●

●●●

●● ●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●

●● ●

●●

●●

●●

● ●

●●●

●●

●●●

●●

● ●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●● ●

● ●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●

● ●●

●●●

●●

●●●

●●

●●

●●

● ●

●●●●●

●●

●●●● ●●●

●●

●●●●

●●

●●

● ●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●●

●● ●

●●

● ●

●●

●●●

●●

●●

20 40 60

● ●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●● ●●

●●

●●

●●

●●●

●●

●●

● ●●

●●

●●●

●●●

● ●

●●

●●●

● ●

●●●

●●

●●

● ●

●●●

● ●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

● ●

●●●●

●●

●● ●

●●

● ●

●● ●

● ●

●●●

●●

●●

●●

●●

●● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●

●●●●●●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

● ●●

●● ●

●●

● ●

●●

●●●

●●

●●● ●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●● ●

●●●

●●

● ●

●●●

●●

●●●

●●

●●

●●

● ●●

● ●●

●● ●

●●●

●●

●●

●●

● ●●

●●

●●

●●●

●●●

●● ●

●●

●●

●●

●●

●●●●

●●

●●●

●●

● ●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●

● ● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●●

●●

●●

●● ●

●●●

● ●●

●●●

●●

●●

●●

● ●●

●●

● ●

● ●

●●

●●●●●

●●

●●●

●●●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●●

●●●

●●

●●

●●

●●●

● ●

●●

200 500 800

● ●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●●

● ●

●●

●●

● ●●

● ●

●●

●●●

●●

●●●

●● ●

●●

● ●

●●●

●●

●● ●

●●

●●

●●

●●●

●● ●

●● ●

●●●

●●

●●

●●

●●●

●●

●●

●● ●

●● ●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●●

●●●

●●●

●●

●●

●●

● ●●

● ●

●●

●●

●●

●●●●●

●●

●●●●●● ●

●●

●● ●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●●

●● ●

●●

● ●

●●

●● ●

● ●

●●SATQ

Figure 5: A SPLOM plot is a fast way of detecting non-linearities in the pair wise correla-tions as well as problems with distributions. For each cell below the diagonal, the x axisreflects the column variable, the y axis, the row variable.

6.2 Bars vs. Boxes

Many psychological graphs report means by using “bar graphs”. These are particularlyuninformative, for they carry no information about the amount of variability. Some then

27

Page 3: 6 Day 2: Graphical data displays and Exploratory Data Analysis · 6 Day 2: Graphical data displays and Exploratory Data Analysis A compelling reason to use R is for its graphics capabilities

> data(sat.act)

> pairs.panels(sat.act, scale = TRUE)

gender

0 2 4

0.087 −0.021

5 15 30

−0.037 −0.019

200 500 800

1.0

1.4

1.8

−0.17

02

4

●●●

● ●

●● ●

●● ●

●●

●●

●●

●●

● ●

●●●

●● ●

● ●●

●● ●

●●

●●

●●

●● ●●●●

●● ●

●●

● ●

● ●

● ●

● ●●

●● ●

● ●

●●●

●●

●●

●●●

●●●●

●●●

●● ●●

●●

●● ●

●●

● ●●

●●

●●

● ●

● ●

● ●●

●●

● ●

● ●

●●●

●●

●●

●●

●●

● ●

●● ●

●●●

●●●●●●●

● ●

●●

●●●

●●

●●●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●●●

●●

● ●

●●

●●●

●● ●●●●●

●●●

●●

●●●●

●●

●●

●●

● ●

●●

●●

●●●●

●●●

●●

●●

●● ●

●● ●

● ●●●

●●

●●●●

●●●

●●

● ●

●●●

●●

●●

●● ●● ●●

●●

●●

● ●

●●

●●●

●●

●●●●

●● ●●

●●● ●●● ●●●

●● ●●●●●

●●● ●●●●●● ●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●● ●●●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

●●●

education0.55 0.15 0.046 0.035

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●● ●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●●

●● ●●

●● ●

●●●●

●●

●●●

●●

●●

●●●●●

●●

●●●

●●

●●● ●●

●●

●●●

●●

●● ●

●● ●

●●●● ●

●●●

●●

● ●

●●

●●

●●

●●●●

●●●●● ●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●●●

●●

●●●

●●●●●●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●●●●

●●

●●●●

●● ●●●●●

●●

●●● ●●● ●●●

●●

●● ●●

●●

●●●●

●●●●●

●●● ●●

●●●

●●●●

●●●●

●● ●● ●●

● ●●

●●

●●

●●

● ●●

●●●●

●●

●●

●● ●●

●● ●●●●

●●●●

●● ●●●●●●●● ●●● ●●●●

●● ●●●●● ●●●● ●●●●●● ●●●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

● ●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●● ●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●●●

●●●

●●●●

●●

●● ●

●●

●●

●●● ●●

●●

●●●

●●

● ●●● ●

●●

●●

●● ●

●● ●

●●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●● ●

●●●●

●●●●●●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●● ●●●●

●●

●●●●

●● ●●●●●

●●

●●●●● ●●●●

●●

●●●●

●●

●●

●●

●●●●●

●● ●●●

●●●

● ●●●

●●●●

●●●●●●

●●●

●●

●●

●●

● ●●

●●●●

●●

●●

●● ●●

●● ●●●●

●●

●●●

●● ●●●●● ●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

age0.11 −0.042

2040

60

−0.034

515

30

● ●

●●

●●

●●●●●● ●

●●

●●

● ●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●● ●●

●●

●●

● ●

● ●

●●

● ●●●

●●

● ●●

●●

●●

●●

●●●

● ●

●●

● ●

●●

●●

●●●

●●●

●● ●●

●●

● ●●●●●● ●

● ●●

●●●

●●

●●

●●●● ●

● ●

●●●

●●

●●

●●●

●●

●●●● ●

●●

●●●

●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●●

●●

●●

●●●

●●● ●

●●

●●

● ●

●●●

●●

●●●●

●●

● ●

● ●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

● ●

●●●

●●

●●

●●

●●●● ●

●●●

●●●●●

● ●●

●●

●●

● ●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●

●●

● ●

● ●

● ●

●●●●

● ●

●●●

● ●●●

●●

●●

● ●

● ●

● ●

●●● ●●●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

● ●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

● ●

● ●

●●

●●

●●

●●●

●● ●

● ●● ●

●●

●● ● ●●● ●●●●

●●

●●

●●

●●● ●●

● ●

●●●

● ●

●●

●●●

●●

●●

●● ●

●●

●● ●

● ●

●●

●●

● ●

● ●●

●● ●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●●●

●●●

●●●

●●●

●●

●●

●●●

●● ●●

●●

●●

●●

●● ●

●●

●●●●

●●

●●

● ●

●●

●●

●● ●

● ●

●●

● ●

●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

● ●

●●

●●● ●

●●

●●

●●

●●

● ●

● ●

●●●

●●

●●

●●

●●●●●

●●●●

●●●●●

●●●

●●

●●

●●

●●● ●

●●

● ●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●

●●●

● ●

● ●

●●

●●● ●

●●

● ● ●

●● ●●

● ●

●●

● ●

● ●

● ●

●●● ●●● ●

●●

● ●

●● ●

● ●

●●

●●

●●

● ●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

● ● ●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●●

●● ●

● ●● ●

●●

● ●● ●●● ●●

● ●●

●●●

●●

●●

● ●● ●●

●●

●●●

● ●

●●

● ●●

●●

●●●● ●

●●

●●●

●●

●●

●●

● ●

● ●●

●● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●●

●●

● ●●●

●●●

●●●

●●●

●●

●●

●●●

●● ●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●●●●

●●●

●●

● ●

●●

● ●●●

●●

● ●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●

●●

● ●

● ●

●●

● ●● ●

● ●

● ●●

●●●●

● ●

●● ACT

0.56 0.59

●●●

●●

●●●

●●●●

● ●

●●●

●●

●●

●●

● ●

●●

●●●

● ●

●●●

●● ●

●●●

●●

●● ●

●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

● ●● ●

●●

●●

●●●●

●●

●●

●●

●●

●●

●● ●●

●●

●● ●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●● ●

●●

●●●

●●●●

●● ●

●●

●●

●●

●●

●●

●●●

●● ●

●●

●●

●●

●●●

●●

●●

●●

●● ●

●● ●

●●

●●●●

●●

●●●

●●

● ●

● ●●

●●●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●● ●

●●●

●●

●●

●●

●●●

●●

●● ●

●●●●●

●●

●●

●●

●●

● ●

●●

●●●●● ●●

●●

●●

● ●

●●●

●●

●●

● ●●●●●

●●

●●●

●●●●

● ●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●●●

●●

● ●●

●●

●●

●●

● ●●

●●● ●

●●

● ●●

●●

●●

●●

●●

● ●

●● ●

● ●

●●

●● ●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●● ●

● ●

●●

●●

●●●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●● ●

●●

●●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

● ●

●● ●

●●● ●

● ●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

● ●

●●●

●●●

●●

●●●

●●

●●

●●●

●●●

● ●

●●

●●●●

●●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●●

●●

●●

●●

●●●

●●

●●

● ●●●●●

●●

●●

●●●●

● ●

●●

●●

●●●

● ●

●●●

●●

●●●

●●●

●●

●●

●●

● ●● ●

● ●

●●●

●●

●●

●●

● ●●

●● ●●

● ●

● ●●

●●

●●

●●

●●

● ●

●●●

● ●

●●

●●●

● ●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●● ●

●●

●●

●●

●●●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●● ●

●●

●●●

●●

●●

● ●

● ●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

● ●

●● ●

●●● ●

● ●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●● ●

●●

● ●

●●●

●●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●●

●●

●●

●●

●●●

●●

●●

● ●●●●●

●●

●●

●●●●

● ●

●●

●●

●●●

● ●

●●●

●●

●●●

●● ●

●●●

●●

●●

●●● ●

● ●

●●●

●●

●●

●●

●● ●

●●●●

●●

●● ●

●●

● ●

●●

●●

● ●

●●●

● ●

●●

●●●

●●●

● ●

●●●

●● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●●●

● ●

●●

●●

●●● ●

●●

●●●●●●

●●

●●

●●

●●

●●

●●●●

● ●

●● ●

●●

●●

● ●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●●

●●

● ●●

●●●●

● ●●

● ●

● ●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

● ●

●● ●

●●●

● ●

●●

●●

●●

●●●

●●

● ●

●●●

● ●●

●●

●●●

●●

●●

●●●

●●●

● ●

●●

●●●●

●●

●●

●●

●●

●●

●●

●● ●

●●●●

●●

● ●

●●

●●

● ●

●●

●●●● ●●

● ●

●●

●●

●●●

●●

●●

● ●●●●

●●

●●●

●● ●●

●●

● ●

●●

●●

● ●

●●●

●●

●●●

●●●

●●

●●

●●

●●●●

●●

●●SATV

200

500

800

0.64

1.0 1.4 1.8

200

500

800

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●●

● ●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●● ●

●●

●● ●

●●

●●

●●

●●●

●● ●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●● ●●

●● ●

●●●

●●

●●

●●

●●

● ●●●

●●

●● ●

●●

●●

●● ●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

● ●●

●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

● ●

● ●

●●●●●

●●

● ●●●●●●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●●

●●●

●●●

●●

● ●

●●

●●●

● ●

● ●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●●

●● ●

●●

●●

●● ●

● ●

● ●●

●●

●●

● ●

●●●

●● ●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●

●● ●

●●

●●

●●

● ●

●●●

●●

●●●

●●

● ●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●● ●

● ●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●

● ●●

●●●

●●

●●●

●●

●●

●●

● ●

●●●●●

●●

●●●● ●●●

●●

●●●●

●●

●●

● ●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●●

●● ●

●●

● ●

●●

●●●

●●

●●

20 40 60

● ●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●● ●●

●●

●●

●●

●●●

●●

●●

● ●●

●●

●●●

●●●

● ●

●●

●●●

● ●

●●●

●●

●●

● ●

●●●

● ●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

● ●

●●●●

●●

●● ●

●●

● ●

●● ●

● ●

●●●

●●

●●

●●

●●

●● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●

●●●●●●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

● ●●

●● ●

●●

● ●

●●

●●●

●●

●●● ●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●● ●

●●●

●●

● ●

●●●

●●

●●●

●●

●●

●●

● ●●

● ●●

●● ●

●●●

●●

●●

●●

● ●●

●●

●●

●●●

●●●

●● ●

●●

●●

●●

●●

●●●●

●●

●●●

●●

● ●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●

● ● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●●

●●

●●

●● ●

●●●

● ●●

●●●

●●

●●

●●

● ●●

●●

● ●

● ●

●●

●●●●●

●●

●●●

●●●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●●

●●●

●●

●●

●●

●●●

● ●

●●

200 500 800

● ●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●●

● ●

●●

●●

● ●●

● ●

●●

●●●

●●

●●●

●● ●

●●

● ●

●●●

●●

●● ●

●●

●●

●●

●●●

●● ●

●● ●

●●●

●●

●●

●●

●●●

●●

●●

●● ●

●● ●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●●

●●●

●●●

●●

●●

●●

● ●●

● ●

●●

●●

●●

●●●●●

●●

●●●●●● ●

●●

●● ●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●●

●● ●

●●

● ●

●●

●● ●

● ●

●●SATQ

Figure 6: A probably not useful option in pairs.panels, is to scale the font size of thecorrelations to reflect their magnitude. More a demonstration of the range of possibilitiesin R graphics rather than a useful option.

28

Page 4: 6 Day 2: Graphical data displays and Exploratory Data Analysis · 6 Day 2: Graphical data displays and Exploratory Data Analysis A compelling reason to use R is for its graphics capabilities

add error bars to form “dynamite plots”, which are slightly more informative. A muchmore useful graphic that displays the median, interquartile range, and the 99% confidenceintervals is the boxplot. For small data sets, showing the actual data points, with orwithout error bars is easy to do using the stripchart function.

Consider the following four data sets. What is the best way to describe their di!er-ences?

> set.seed(42)

> x1 <- c(sample(5, 20, replace = TRUE) + 3, rep(NA, 30))

> x2 <- c(sample(10, 20, replace = TRUE), rep(NA, 30))

> x3 <- c(sample(5, 10, replace = TRUE), sample(5, 10, replace = TRUE) + 10, rep(NA, 30))

> x4 <- sample(10, 50, replace = TRUE)

> X.df <- data.frame(x1, x2, x3, x4)

6.3 Graph basics

There are many options available for graphing, including the number of graphs to presentper page, the x and y limits of the graph, the x and y labels, the point size, the color, thetype of line, etc. These are all specified in the help for plot, and the associated links.Here are just a few of the high points.

par Graphic options are stored in an object, op. These can be changed by using the parfunction which will set the graphics options to particular values. A typical use is toset the number of plots per page. This is done before calling a specific plot function.op <- par(mfrow=c(3,2)) #will put 3 rows of 2 columns of graphs on a page

x(y)lim The ranges of the x (y) variable. These are set inside the particular graphics call.xlim =c(0,10) #will make the axis range from 0 to 10. I

type Choose between p,l,b (points, lines, both)

pch What plotting character to use.

lty What line type (solid, dashed, dotted, etc.) item[col] What color to use.

6.4 Regression plots with fits

A more typical graphics problem is to plot regression lines, perhaps with the underlyingdata.

In addition to showing the regression analysis for presentations, one can also examine theresiduals and errors of the regression.

29

Page 5: 6 Day 2: Graphical data displays and Exploratory Data Analysis · 6 Day 2: Graphical data displays and Exploratory Data Analysis A compelling reason to use R is for its graphics capabilities

> op <- par(mfrow = c(3, 2))

> barplot(colMeans(na.omit(X.df)), ylim = c(0, 14), main = "A particularly uninformative graph")

> box()

> error.bars(X.df, bars = TRUE, ylim = c(0, 14), main = "Somewhat more informative")

> boxplot(X.df, main = "Better yet")

> stripchart(X.df, method = "stack", vertical = TRUE, main = "Perhaps better")

> stripchart(X.df, method = "stack", vertical = TRUE, main = "Add error bars")

> error.bars(X.df, add = TRUE)

> error.bars(X.df, main = "Just error bars")

> op <- par(mfrow = c(1, 1))

x1 x2 x3 x4

A particularly uninformative graph

04

812

Somewhat more informative

Independent Variable

Depe

nden

t Var

iabl

e

04

812

x1 x2 x3 x4

04

812

x1 x2 x3 x4

26

1014

Better yet

x1 x2 x3 x4

26

1014

Perhaps better

x1 x2 x3 x4

26

1014

Add error bars

● ●

● ●

Just error bars

Independent Variable

Depe

nden

t Var

iabl

e

x1 x2 x3 x4

35

79

Figure 7: Six di!erent ways of presenting the di!erences between four groups.

30

Page 6: 6 Day 2: Graphical data displays and Exploratory Data Analysis · 6 Day 2: Graphical data displays and Exploratory Data Analysis A compelling reason to use R is for its graphics capabilities

> op <- par(mfrow = c(3, 2))

> plot(1:10)

> plot(1:10, xlab = "Label the x axis", ylab = "label the y axis", main = "And add a title and new data",

+ pch = 21, col = "blue")

> points(1:4, 3:6, bg = "red", pch = 22)

> plot(1:10, xlab = "x is oversized", ylab = "y axis label", main = "Change the axis sizes",

+ pch = 23, bg = "blue", xlim = c(-5, 15), ylim = c(0, 20))

> points(1:4, 13:16, bg = "red", pch = 24)

> plot(1:10, ylab = "y axis label", main = "Line graph", pch = 23, bg = "blue", type = "l")

> plot(1:10, 2:11, xlab = "X axis", ylab = "y axis label", main = "Line graphs with and without points",

+ pch = 23, bg = "blue", type = "b", ylim = c(0, 15))

> points(1:10, 12:3, type = "l", lty = "dotted")

> curve(cos(x), -2 * pi, 2 * pi, main = "Show a curve for a function")

> op <- par(mfrow = c(1, 1))

●●

●●

●●

●●

●●

2 4 6 8 10

24

68

Index

1:10

●●

●●

●●

●●

●●

2 4 6 8 10

24

68

And add a title and new data

Label the x axis

labe

l the

y a

xis

−5 0 5 10 15

05

15

Change the axis sizes

x is oversized

y ax

is la

bel

2 4 6 8 10

24

68

Line graph

Index

y ax

is la

bel

2 4 6 8 10

05

1015

Line graphs with and without points

X axis

y ax

is la

bel

−6 −4 −2 0 2 4 6

−1.0

0.0

1.0

Show a curve for a function

x

cos(

x)

Figure 8: Various options of base graphics. Panel 1 is the basic call to plot. Panel 2 is thesame, but with labels and titles. Panel 3 shows how to add more data, panel 4 how to doa line graph, panel 5 adds a second line, panel 6 is just an example the curve function.31

Page 7: 6 Day 2: Graphical data displays and Exploratory Data Analysis · 6 Day 2: Graphical data displays and Exploratory Data Analysis A compelling reason to use R is for its graphics capabilities

> data(sat.act)

> with(sat.act, plot(SATQ ~ SATV, main = "SAT Quantitative varies with SAT Verbal"))

> model = lm(SATQ ~ SATV, data = sat.act)

> abline(model)

> lab <- paste("SATQ = ", round(model$coef[1]), "+", round(model$coef[2], 2), "* SATV")

> text(600, 200, lab)

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

200 300 400 500 600 700 800

200

300

400

500

600

700

800

SAT Quantitative varies with SAT Verbal

SATV

SATQ

SATQ = 208 + 0.66 * SATV

Figure 9: A regression data set with a regression line

32

Page 8: 6 Day 2: Graphical data displays and Exploratory Data Analysis · 6 Day 2: Graphical data displays and Exploratory Data Analysis A compelling reason to use R is for its graphics capabilities

> data(sat.act)

> color <- c("blue", "red")

> with(sat.act, plot(SATQ ~ SATV, col = color[gender], main = "SATQ varies by SATV and gender"))

> by(sat.act, sat.act$gender, function(x) abline(lm(SATQ ~ SATV, data = x)))

sat.act$gender: 1NULL-----------------------------------------------------------------------------------sat.act$gender: 2NULL

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

200 300 400 500 600 700 800

200

300

400

500

600

700

800

SATQ varies by SATV and gender

SATV

SATQ

Figure 10: A regression data set with two regression lines. The higher regression line is forthe women.

33

Page 9: 6 Day 2: Graphical data displays and Exploratory Data Analysis · 6 Day 2: Graphical data displays and Exploratory Data Analysis A compelling reason to use R is for its graphics capabilities

> op <- par(mfrow = c(2, 2))

> plot(lm(SATQ ~ SATV, data = sat.act))

> op <- par(mfrow = c(1, 1))

400 500 600 700

−300

020

0

Fitted values

Resid

uals

●● ● ●

● ●

● ●●●

●●

●●●

●●

● ●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

● ●

●●

●●

● ●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●● ●

●●●●

●●●

● ●● ●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●● ●●●

● ●●

●●

●● ●

●●

●●●●

●●

● ●●

●● ●

● ●

●●●

●●

● ●●

●●

● ●●

● ● ●●

●●

● ●

● ●

● ●

● ●

●●●

●●

●●

● ●

●●

●●

● ●●● ●

● ●●

● ●●

●●

● ●

● ●

●●

● ●

●●

● ●●

● ●

●●

●●

●●

● ●●

●●●

●●

●●

●●●●●

●●●

●●

●●

●●

●● ●● ●●

●●

●●

●●

●●

●●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ● ●●●●

● ●

●●

●●

● ●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●●

●●●

Residuals vs Fitted

3343636271

35345

●●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●●

●●●

●●

●●●●

●●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●

−3 −2 −1 0 1 2 3−3

−11

3

Theoretical Quantiles

Stan

dard

ized

resid

uals

Normal Q−Q

3343636271

35345

400 500 600 700

0.0

0.5

1.0

1.5

Fitted values

Stan

dard

ized

resid

uals

● ● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●● ● ●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●

●●

●●

●● ●

●●

● ●

●●●

●●

●●

●●

●●

● ●●

●●●

●●

● ●

●●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

Scale−Location334363627135345

0.000 0.010 0.020

−4−2

02

4

Leverage

Stan

dard

ized

resid

uals

●●●●

●●

● ●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●● ●

●●●●

●●●

●●●●

●●

●●●

●●

● ●

●●●●●

●●

●●

●●

●●● ●●●

●●●

●●

●●●

●●

●●●●●●

●●●

●●●

●●

●●●

●●

●●●

●●

● ●●

●●●●

●●

● ●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●●●●

●●●

●●●

●●

● ●

● ●

● ●

● ●

●●

●●●●●

●●

●●

●●

● ●●

●●●

●●●

●●

●● ●

●●●●●

●●●

●●

●●

●●

● ●● ●●●

● ●

●●

●●

●●

●●●

● ●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●

● ●

●●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●●

Cook's distance

Residuals vs Leverage

308993534535938

Figure 11: default

34

Page 10: 6 Day 2: Graphical data displays and Exploratory Data Analysis · 6 Day 2: Graphical data displays and Exploratory Data Analysis A compelling reason to use R is for its graphics capabilities

6.5 ANOVA plots

7 More complex graphics

In addition to the standard ways of displaying data, R includes packages meant for moregraphical displays. These including mapping functions related to GIS files in the mappackage, density plots, and in particular graphs using such packages as Rraghviz .

7.1 Examples of Rgraphviz

Several of the psychometric functions in psych make use of Rgraphviz and are described inmore detail in the vignette psych for sem. The next two figures take advantage of built indata sets, Harman74.cor and Thurstone.

This next figure, produced by structure.graph shows a symbolic structural equation pathmodel.

> fxs <- structure.list(9, list(X1 = c(1, 2, 3), X2 = c(4, 5, 6), X3 = c(7, 8, 9)))

> phi <- phi.list(4, list(F1 = c(4), F2 = c(4), F3 = c(4), F4 = c(1, 2, 3)))

> fyx <- structure.list(3, list(Y = c(1, 2, 3)), "Y")

7.2 Using the maps package to process GIS data files

A great deal of geographic data is stored in GIS files on servers around the world. TheseGIS description files include all kinds of information, including the geographic coordinatesof geographical regions (cities, states, countries, rivers, harbours, etc.) that can then beplotted using the maps package and its alternatives. The next figure is just a demonstrationof what can be done (Figure 15).

7.3 Using graphics to teach sampling theory

Most students recognize that increasing the sample size will reduce the standard error ofthe measures and increase the ability to detect an e!ect if it is there. Unfortunately, somedo not realize that the probability of a Type I error does not change as a function ofsample size. A simple demonstration of the e!ect of increasing sample size on the width ofthe confidence intervals and also the probability of that confidence interval containing thepopulation value is seen in Figure 16)

35

Page 11: 6 Day 2: Graphical data displays and Exploratory Data Analysis · 6 Day 2: Graphical data displays and Exploratory Data Analysis A compelling reason to use R is for its graphics capabilities

> data(Harman74.cor)

> ic <- ICLUST(Harman74.cor$cov, title = "The Holzinger-Harman 24 mental measurement problem")

The Holzinger−Harman 24 mental measurement problem

VisualPerception

Cubes

PaperFormBoard

Flags

GeneralInformation

PargraphComprehension

SentenceCompletion

WordClassification

WordMeaning

Addition

Code

CountingDots

StraightCurvedCapitals

WordRecognition

NumberRecognition

FigureRecognition

ObjectNumber

NumberFigure

FigureWord

Deduction

NumericalPuzzles

ProblemReasoning

SeriesCompletion

ArithmeticProblems

C1

C2

C3

C4

C5

C6

C7

C8

C9

C10

C11 C12

C13

C14

C15

C16

C17C18

C19

C20

C21

C22C23

0.710.71

0.850.85

0.850.85

0.760.76

0.730.73

0.670.67

0.690.69

0.660.66

0.940.94

0.880.86

0.710.71

0.88

0.85

0.820.95

0.750.8

0.870.86

0.640.64

0.80.79

0.890.85

0.750.81

0.82

0.82

0.9

0.84

0.82

0.92

0.730.94

Figure 12: An example of tree diagram produced by the hierarchical cluster algorithm,ICLUST and drawn using Rgraphviz. The data set is 24 mental measurements used byHolzinger and Harman as an example factor analysis problem.

36

Page 12: 6 Day 2: Graphical data displays and Exploratory Data Analysis · 6 Day 2: Graphical data displays and Exploratory Data Analysis A compelling reason to use R is for its graphics capabilities

> data(bifactor)

> om <- omega(Thurstone, main = "A bifactor solution to a Thurstone data set")

Omega

Sentences

Vocabulary

Sent.Completion

First.Letters

4.Letter.Words

Suffixes

Letter.Series

Pedigrees

Letter.Group

g

F1*

F2*

F3*

0.7

0.7

0.7

0.60.60.60.6

0.6

0.5

0.60.60.5

0.60.50.4

0.60.30.5

Figure 13: An example of bifactor model using Rgrapviz

37

Page 13: 6 Day 2: Graphical data displays and Exploratory Data Analysis · 6 Day 2: Graphical data displays and Exploratory Data Analysis A compelling reason to use R is for its graphics capabilities

> if (require(Rgraphviz)) {

+ sg3 <- structure.graph(fxs, phi, fyx)

+ } else {

+ plot(1:4, main = "Rgraphviz is not available")

+ }

Structural model

x1

x2

x3

x4

x5

x6

x7

x8

x9

y1

y2

y3

X1

X2

X3

YYa1Ya2Ya3

a1a2a3

b4b5b6

c7c8c9

rad

rbd

rcd

Figure 14: A symbolic structural model. Three independent latent variables are regressedon a latent Y.

38

Page 14: 6 Day 2: Graphical data displays and Exploratory Data Analysis · 6 Day 2: Graphical data displays and Exploratory Data Analysis A compelling reason to use R is for its graphics capabilities

> library(maps)

> map("county")

Figure 15: The counties of the US may be combined with demographic data to displayincome, voting records, education or any other data set organized by county.

39

Page 15: 6 Day 2: Graphical data displays and Exploratory Data Analysis · 6 Day 2: Graphical data displays and Exploratory Data Analysis A compelling reason to use R is for its graphics capabilities

> op <- par(mfrow = c(3, 1))

> set.seed(42)

> x <- matrix(rnorm(500), ncol = 20)

> error.bars(x, ylim = c(-1, 1), main = "N= 25")

> abline(h = 0)

> x <- matrix(rnorm(2000), ncol = 20)

> error.bars(x, ylim = c(-1, 1), main = "N = 100")

> abline(h = 0)

> x <- matrix(rnorm(8000), ncol = 20)

> error.bars(x, ylim = c(-1, 1), main = "N = 400")

> abline(h = 0)

> op <- par(mfrow = c(1, 1))

● ●● ●

● ●●

●●

● ●

● ●●

N= 25

Independent Variable

Depe

nden

t Var

iabl

e

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

−1.0

0.0

1.0

● ● ●

● ●●

● ●

●● ●

●● ●

● ● ●

N = 100

Independent Variable

Depe

nden

t Var

iabl

e

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

−1.0

0.0

1.0

●● ● ●

● ● ● ● ● ● ● ● ● ●●

● ● ● ● ●

N = 400

Independent Variable

Depe

nden

t Var

iabl

e

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

−1.0

0.0

1.0

Figure 16: Increasing the sample size reduces the width of the confidence interval, but doesnot change the probability of including the population value.

40