Upload
dangtuyen
View
213
Download
0
Embed Size (px)
Citation preview
Statistical GraphicsAbout R
Graphics in RThe rgl package
Exploring Data with Statistical Graphics in R
Duncan Murdoch
Department of Statistical and Actuarial SciencesUniversity of Western Ontario
November 23, 2011
1 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
1 Statistical Graphics
2 About RR HistoryWhat is R today?
3 Graphics in ROverviewExamples
4 The rgl package
2 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
1 Statistical Graphics
2 About RR HistoryWhat is R today?
3 Graphics in ROverviewExamples
4 The rgl package
3 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
Example
In STAT 1024, I use multiple choice exams. I use multipleversions of the test to make it harder to cheat.The software our department uses (developed by others)prints tables of correct answers for each version of the test;we can look at those to recognize problem questions.Doing it graphically is better...
4 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
Example
In STAT 1024, I use multiple choice exams. I use multipleversions of the test to make it harder to cheat.The software our department uses (developed by others)prints tables of correct answers for each version of the test;we can look at those to recognize problem questions.Doing it graphically is better...
4 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
Example
In STAT 1024, I use multiple choice exams. I use multipleversions of the test to make it harder to cheat.The software our department uses (developed by others)prints tables of correct answers for each version of the test;we can look at those to recognize problem questions.Doing it graphically is better...
4 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
A B C D E
Q23 96% correct
A B C D E
350
574
790
889
A B C D EA B C D EA B C D E A B C D E
Q21 92% correct
A B C D E
350
574
790
889
A B C D EA B C D EA B C D E
A B C D E
Q4 89% correct
A B C D E
350
574
790
889
A B C D EA B C D EA B C D E A B C D E
Q3 89% correct
A B C D E
350
574
790
889
A B C D EA B C D EA B C D E
A B C D E
Q5 88% correct
A B C D E
350
574
790
889
A B C D EA B C D EA B C D E A B C D E
Q14 84% correct
A B C D E
350
574
790
889
A B C D EA B C D EA B C D E
5 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
A B C D E
Q25 40% correct
A B C D E
350
574
790
889
A B C D EA B C D EA B C D E A B C D E
Q28 39% correct
A B C D E
350
574
790
889
A B C D EA B C D EA B C D E
A B C D E
Q9 38% correct
A B C D E
350
574
790
889
A B C D EA B C D EA B C D E A B C D E
Q29 33% correct
A B C D E
350
574
790
889
A B C D EA B C D EA B C D E
A B C D E
Q13 33% correct
A B C D E
350
574
790
889
A B C D EA B C D EA B C D E A B C D E
Q16 15% correct
A B C D E
350
574
790
889
A B C D EA B C D EA B C D E
6 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
R HistoryWhat is R today?
1 Statistical Graphics
2 About RR HistoryWhat is R today?
3 Graphics in ROverviewExamples
4 The rgl package
7 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
R HistoryWhat is R today?
What is R?
R is a software environment for statistical computing andgraphics.It compiles and runs on a wide variety of UNIX platforms,Windows and MacOS.It is free (in both senses).Its web site is http://www.r-project.org.
8 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
R HistoryWhat is R today?
First there was S...
S began as a project at Bell Laboratories in 1976, involvingJohn Chambers, Rick Becker, Doug Dunn, Paul Tukey, andGraham Wilkinson.
We wanted users to be able to begin in an interactiveenvironment, where they did not consciously think ofthemselves as programming. Then as their needsbecame clearer and their sophistication increased,they should be able to slide gradually intoprogramming, when the language and system aspectswould become more important.
— John Chambers, in Stages in the Evolution of S
9 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
R HistoryWhat is R today?
The Birth of R
Around 1992, Ross Ihaka and Robert Gentleman at theUniversity of Auckland needed statistical software for ateaching lab. S-PLUS (a commercial implementation of S)didn’t run on MacOS; they decided to write their own. Ross wasvery impressed with Scheme (a Lisp dialect); they based theinitial code on the design of Scheme interpreters.
To make the interpreter useful, we had to add datastructures to support statistical work and to choose auser interface. We wanted a command driveninterface and, since we were both very familiar with S,it seemed natural to use an S-like syntax.
— Ross Ihaka, in R: Past and Future History
10 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
R HistoryWhat is R today?
R is a Moving Target
R has two “new feature” releases and approximately two“patch” releases per year.Since October 2004, there have been about 1800 NEWSentries about changes to R. Most of these are “tweaking”:making the code more consistent, bug-free, efficient, etc.We try to balance back-compatibility with progress: mostold code will still work, but not all.
11 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
R HistoryWhat is R today?
Contributed Packages!
There are about 3350 contributed packages on CRAN(http://cran.r-project.org), 1100 packages inBioconductor (http://www.bioconductor.org), manyother packages in private use.Publishing an R package has become an accepted (almostexpected) way to disseminate statistical research.
12 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
OverviewExamples
1 Statistical Graphics
2 About RR HistoryWhat is R today?
3 Graphics in ROverviewExamples
4 The rgl package
13 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
OverviewExamples
Several Different Systems
R has two underlying graphics systems:Classic S graphics is an “ink on paper” model: functionsadd things to the current graphics device, possibly coveringwhat was there, but only erasing the whole frame at once.Paul Murrell’s grid graphics is based on a hierarchicalmodel of a scene, with transformations possible at eachlevel. It is much more flexible than classic graphics,allowing objects to be changed, but is not itself interactive.
14 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
OverviewExamples
High Level Packages
Classic high level graphics are supported by the graphpackage.Deepayan Sarkar’s lattice package implements aversion of Cleveland’s trellis graphics using grid.Hadley Wickham’s ggplot2 package implements aversion of Leland Wilkinson’s Grammar of Graphics usinggrid.
15 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
OverviewExamples
Other Systems
rggobi is an interface to the GGobi package forinteractive graphics.rgl is a package that provides rotatable 3-D graphics, withsome classic S-like plotting functions.scatterplot3d does static 3-D graphics within theclassic graphics engine.iplots does interactive, linked graphics using Java.
16 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
OverviewExamples
Learning about Graphics
I won’t cover all the graphics systems today, only classicgraphics and rgl.
For more depth I highly recommend Paul Murrell’s book RGraphics to learn either classic S graphics or grid.Deepayan Sarkar’s Lattice: Multivariate Data Visualizationwith R describes that system.Other books on R are listed at http://www.r-project.org/doc/bib/R-books.html.
17 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
OverviewExamples
Scatterplots
> plot(cars$speed, cars$dist, col="blue",+ main="Main title", xlim = c(0, 30))
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
● ●
●
●
●●
●
●
0 5 10 15 20 25 30
020
4060
8010
012
0
Main title
cars$speed
cars
$dis
t
18 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
OverviewExamples
Bar plots
> barplot(VADeaths, beside=TRUE, legend=TRUE,+ ylab="Deaths per 1000", ylim=c(0, 90))
Rural Male Rural Female Urban Male Urban Female
50−5455−5960−6465−6970−74
Dea
ths
per
1000
020
4060
80
19 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
OverviewExamples
Dot charts
> dotchart(VADeaths, xlab="Deaths per 1000",+ xlim=c(0, 75))
50−5455−5960−6465−6970−74
50−5455−5960−6465−6970−74
50−5455−5960−6465−6970−74
50−5455−5960−6465−6970−74
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●Rural Male
Rural Female
Urban Male
Urban Female
0 20 40 60
Deaths per 1000
20 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
OverviewExamples
Histograms
> x <- rnorm(10000)> hist(x, breaks="Scott")
Histogram of x
x
Fre
quen
cy
−4 −2 0 2 4
020
040
060
080
0
21 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
OverviewExamples
QQ Plots
> x <- rt(1000, 3)> qqnorm(x); qqplot(rnorm(1000), x)
●●
●
●
●
●●
●
●●●
●
●
●●
●
●●
●●
●●●
●
●
●
●
●
●●
●
● ●●●● ●
●
●●
●
●●
●
●
●
●
●● ●●●
●●●
●
●●
● ●●
●
●
●
●
●
●
●
●●
● ●●●
●
●●
●
●
●●
●
●● ●●
●●
●
●
●●
●
●
●
●●
●●
●
●●● ●●
●
●
●●●
●● ●
●●
●
●●●
●
●
●
●
●
●●
●●
●●
●●● ●
●
●
●
●
● ●
● ●●●
●●●●
●●
●
●●●
●
●
● ●●
●●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●●
●
●
●
●●
●
●●
●
●
●●
●
●●
● ●●●
●
●● ●
●
●
●
● ●●●●
●
●
●●
●●
●●
●●●
●
● ●
●
● ●●
●
●●
●●
●
●●●
●●
●●
● ●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●●●
●●● ●●
●
● ●
●●
●
●
●●
●
●●●
●
●
●
●
●
●
●●●
● ●●●
●
●●●
●
●
●●
●●●●●
●●
●
●
●● ●● ●
●
●●
●●
●
●●●
●●
●●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●●●
●●●
●
●●●
●
●●
●
●
●
●●● ●● ●●
●●
●
●
●
●●
●●
●●
●●
●●
●
●●
●
●●
●●
● ● ●
●●
●●●●
●●●
●●●
●
●
●● ●●
●●
●●
●●●
●
●●
●●
●
●
● ●
●
●
●
●●
●●●
●
●●
●
●●●
●
●●
●●
●●●●
●●
●
●
●●
●●●
●
●
●●
●
●●●● ●
●
●
●
●
●●
●
●
●
●●●
●
●
●●●
●
●
●●
●●●●
●
●
●●
●●●
●
●
●●●
●● ●●
●●
●
●●
●
●
● ●
●
●●
●●
●●
●●
●●
●
●●
●
●●
●● ●●
●●
●● ● ●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●●
● ●●
●
●●
●●●
●●
●
●
●●●
●● ●●●
●●
●
●●●
●
●● ●●
●
●●
●
●●
●
●●●
●●●
●●
●
●●●
●
●
●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●● ●
●
●●
●
●● ●
●
●●
●
●
●
●●
●
●● ●
●
●● ● ●●
●●
●
●●●●
●
●
●●
●●
●●●
●
●
●
● ● ●●
●
●
●
●●
●●●●
● ●●
●
●●●
●●●
●
●●
●
●
●●
●●
● ●●●●●●
●●●
●●● ●
●
●●
●●
●
●
●●
●
●
●●●
●●
●●
●
●
●
●
● ●●
●
●
●
●●
●
●
●
● ● ●●●●
●●●
●
●● ●
●●●●
●●
●
●
●
●
● ●●
●
●
●●
●
●
●
●
●●
●
●●
●●
●●●●
●
●● ● ●
●
●
●●
●●●
●
●●
●
●●
●●
●●
●
●●
●
●
●●
●●●
●
●● ●● ●
●●
●●
● ●●
●
●
●●
●
●●
●
● ●
●
●●
●●
●
●
●
●
●
●
●●
●
●●●●●
●●●
●
●
●●●
●● ●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●●
●
●● ●
●●●
●
●
●●
●
●●
●
●
●
●
●●
●●
●
●
●●●
●
●
●
●
●●
●● ●
●● ●●●
●
●●
●●
●●
●● ●
●
●
● ●●
−3 −2 −1 0 1 2 3
−15
−10
−5
05
1015
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
●
●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●●●●●
●
●●
●
−3 −2 −1 0 1 2 3
−15
−10
−5
05
1015
rnorm(1000)
x
22 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
OverviewExamples
Box Plots
> boxplot(Sepal.Length ~ Species, data=iris)> # More on the formula interface later!
●
setosa versicolor virginica
4.5
5.0
5.5
6.0
6.5
7.0
7.5
8.0
23 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
OverviewExamples
Mosaic Plots
> mosaicplot(~ Sex + Age + Survived,+ data = Titanic, shade = TRUE)
Sta
ndar
dize
dR
esid
uals
:<
−4
−4:
−2
−2:
00:
22:
4>
4
Titanic
Sex
Age
Male Female
Chi
ldA
dult
No Yes No Yes
24 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
OverviewExamples
Maps
> library(mapdata)> map("worldHires",+ projection = "azequalarea",+ orientation = c(43, -81.25, 0),+ col = sample(terrain.colors(100)),+ fill =TRUE)
The maps package is by Ray Brownrigg, based on S code by Rick Becker and Allan Wilks.
25 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
OverviewExamples
26 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
OverviewExamples
Correlation Plots—plotcorr() in ellipse package
wt
cyl
disp
hp
carbqsec
gear
am
vs
dratmpg
wt
cyl
disp
hp carb
qsec
gear
am vs drat
mpg
27 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
1 Statistical Graphics
2 About RR HistoryWhat is R today?
3 Graphics in ROverviewExamples
4 The rgl package
28 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
The rgl package
Daniel Adler and I wrote this over the last few years. It’s awrapper for the OpenGL graphics rendering library:
Can display solid 3D objects, with real-time rotation,hidden surface removal, etc.High level functions corresponding to classic S 2Dgraphics: plot3d(), etc.Low level functions for flashy computer graphics effects.
29 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
> x <- sort(rnorm(1000)); y <- rnorm(1000)> z <- rnorm(1000) + atan2(x,y)> plot3d(x,y,z, col=rainbow(1000), type="s")
30 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
Now some demos...
31 of 32
Statistical GraphicsAbout R
Graphics in RThe rgl package
Summary
Graphical displays often give new insights into data.R provides a very flexible and rich system for producinggraphics.
32 of 32