Exploring Data with Statistical Graphics in...

Preview:

Citation preview

Statistical GraphicsAbout R

Graphics in RThe rgl package

Exploring Data with Statistical Graphics in R

Duncan Murdoch

Department of Statistical and Actuarial SciencesUniversity of Western Ontario

November 23, 2011

1 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

1 Statistical Graphics

2 About RR HistoryWhat is R today?

3 Graphics in ROverviewExamples

4 The rgl package

2 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

1 Statistical Graphics

2 About RR HistoryWhat is R today?

3 Graphics in ROverviewExamples

4 The rgl package

3 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

Example

In STAT 1024, I use multiple choice exams. I use multipleversions of the test to make it harder to cheat.The software our department uses (developed by others)prints tables of correct answers for each version of the test;we can look at those to recognize problem questions.Doing it graphically is better...

4 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

Example

In STAT 1024, I use multiple choice exams. I use multipleversions of the test to make it harder to cheat.The software our department uses (developed by others)prints tables of correct answers for each version of the test;we can look at those to recognize problem questions.Doing it graphically is better...

4 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

Example

In STAT 1024, I use multiple choice exams. I use multipleversions of the test to make it harder to cheat.The software our department uses (developed by others)prints tables of correct answers for each version of the test;we can look at those to recognize problem questions.Doing it graphically is better...

4 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

A B C D E

Q23 96% correct

A B C D E

350

574

790

889

A B C D EA B C D EA B C D E A B C D E

Q21 92% correct

A B C D E

350

574

790

889

A B C D EA B C D EA B C D E

A B C D E

Q4 89% correct

A B C D E

350

574

790

889

A B C D EA B C D EA B C D E A B C D E

Q3 89% correct

A B C D E

350

574

790

889

A B C D EA B C D EA B C D E

A B C D E

Q5 88% correct

A B C D E

350

574

790

889

A B C D EA B C D EA B C D E A B C D E

Q14 84% correct

A B C D E

350

574

790

889

A B C D EA B C D EA B C D E

5 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

A B C D E

Q25 40% correct

A B C D E

350

574

790

889

A B C D EA B C D EA B C D E A B C D E

Q28 39% correct

A B C D E

350

574

790

889

A B C D EA B C D EA B C D E

A B C D E

Q9 38% correct

A B C D E

350

574

790

889

A B C D EA B C D EA B C D E A B C D E

Q29 33% correct

A B C D E

350

574

790

889

A B C D EA B C D EA B C D E

A B C D E

Q13 33% correct

A B C D E

350

574

790

889

A B C D EA B C D EA B C D E A B C D E

Q16 15% correct

A B C D E

350

574

790

889

A B C D EA B C D EA B C D E

6 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

R HistoryWhat is R today?

1 Statistical Graphics

2 About RR HistoryWhat is R today?

3 Graphics in ROverviewExamples

4 The rgl package

7 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

R HistoryWhat is R today?

What is R?

R is a software environment for statistical computing andgraphics.It compiles and runs on a wide variety of UNIX platforms,Windows and MacOS.It is free (in both senses).Its web site is http://www.r-project.org.

8 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

R HistoryWhat is R today?

First there was S...

S began as a project at Bell Laboratories in 1976, involvingJohn Chambers, Rick Becker, Doug Dunn, Paul Tukey, andGraham Wilkinson.

We wanted users to be able to begin in an interactiveenvironment, where they did not consciously think ofthemselves as programming. Then as their needsbecame clearer and their sophistication increased,they should be able to slide gradually intoprogramming, when the language and system aspectswould become more important.

— John Chambers, in Stages in the Evolution of S

9 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

R HistoryWhat is R today?

The Birth of R

Around 1992, Ross Ihaka and Robert Gentleman at theUniversity of Auckland needed statistical software for ateaching lab. S-PLUS (a commercial implementation of S)didn’t run on MacOS; they decided to write their own. Ross wasvery impressed with Scheme (a Lisp dialect); they based theinitial code on the design of Scheme interpreters.

To make the interpreter useful, we had to add datastructures to support statistical work and to choose auser interface. We wanted a command driveninterface and, since we were both very familiar with S,it seemed natural to use an S-like syntax.

— Ross Ihaka, in R: Past and Future History

10 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

R HistoryWhat is R today?

R is a Moving Target

R has two “new feature” releases and approximately two“patch” releases per year.Since October 2004, there have been about 1800 NEWSentries about changes to R. Most of these are “tweaking”:making the code more consistent, bug-free, efficient, etc.We try to balance back-compatibility with progress: mostold code will still work, but not all.

11 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

R HistoryWhat is R today?

Contributed Packages!

There are about 3350 contributed packages on CRAN(http://cran.r-project.org), 1100 packages inBioconductor (http://www.bioconductor.org), manyother packages in private use.Publishing an R package has become an accepted (almostexpected) way to disseminate statistical research.

12 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

OverviewExamples

1 Statistical Graphics

2 About RR HistoryWhat is R today?

3 Graphics in ROverviewExamples

4 The rgl package

13 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

OverviewExamples

Several Different Systems

R has two underlying graphics systems:Classic S graphics is an “ink on paper” model: functionsadd things to the current graphics device, possibly coveringwhat was there, but only erasing the whole frame at once.Paul Murrell’s grid graphics is based on a hierarchicalmodel of a scene, with transformations possible at eachlevel. It is much more flexible than classic graphics,allowing objects to be changed, but is not itself interactive.

14 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

OverviewExamples

High Level Packages

Classic high level graphics are supported by the graphpackage.Deepayan Sarkar’s lattice package implements aversion of Cleveland’s trellis graphics using grid.Hadley Wickham’s ggplot2 package implements aversion of Leland Wilkinson’s Grammar of Graphics usinggrid.

15 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

OverviewExamples

Other Systems

rggobi is an interface to the GGobi package forinteractive graphics.rgl is a package that provides rotatable 3-D graphics, withsome classic S-like plotting functions.scatterplot3d does static 3-D graphics within theclassic graphics engine.iplots does interactive, linked graphics using Java.

16 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

OverviewExamples

Learning about Graphics

I won’t cover all the graphics systems today, only classicgraphics and rgl.

For more depth I highly recommend Paul Murrell’s book RGraphics to learn either classic S graphics or grid.Deepayan Sarkar’s Lattice: Multivariate Data Visualizationwith R describes that system.Other books on R are listed at http://www.r-project.org/doc/bib/R-books.html.

17 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

OverviewExamples

Scatterplots

> plot(cars$speed, cars$dist, col="blue",+ main="Main title", xlim = c(0, 30))

●●● ●

●●

●●●

● ●

●●

0 5 10 15 20 25 30

020

4060

8010

012

0

Main title

cars$speed

cars

$dis

t

18 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

OverviewExamples

Bar plots

> barplot(VADeaths, beside=TRUE, legend=TRUE,+ ylab="Deaths per 1000", ylim=c(0, 90))

Rural Male Rural Female Urban Male Urban Female

50−5455−5960−6465−6970−74

Dea

ths

per

1000

020

4060

80

19 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

OverviewExamples

Dot charts

> dotchart(VADeaths, xlab="Deaths per 1000",+ xlim=c(0, 75))

50−5455−5960−6465−6970−74

50−5455−5960−6465−6970−74

50−5455−5960−6465−6970−74

50−5455−5960−6465−6970−74

●●

●●

●●

●●

●●

●●

●●

●●

●Rural Male

Rural Female

Urban Male

Urban Female

0 20 40 60

Deaths per 1000

20 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

OverviewExamples

Histograms

> x <- rnorm(10000)> hist(x, breaks="Scott")

Histogram of x

x

Fre

quen

cy

−4 −2 0 2 4

020

040

060

080

0

21 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

OverviewExamples

QQ Plots

> x <- rt(1000, 3)> qqnorm(x); qqplot(rnorm(1000), x)

●●

●●

●●●

●●

●●

●●

●●●

●●

● ●●●● ●

●●

●●

●● ●●●

●●●

●●

● ●●

●●

● ●●●

●●

●●

●● ●●

●●

●●

●●

●●

●●● ●●

●●●

●● ●

●●

●●●

●●

●●

●●

●●● ●

● ●

● ●●●

●●●●

●●

●●●

● ●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●●●

●● ●

● ●●●●

●●

●●

●●

●●●

● ●

● ●●

●●

●●

●●●

●●

●●

● ●●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●● ●●

● ●

●●

●●

●●●

●●●

● ●●●

●●●

●●

●●●●●

●●

●● ●● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●● ●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ● ●

●●

●●●●

●●●

●●●

●● ●●

●●

●●

●●●

●●

●●

● ●

●●

●●●

●●

●●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●● ●

●●

●●●

●●●

●●

●●●●

●●

●●●

●●●

●● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●● ●●

●●

●● ● ●

●●

●●

●●●

● ●●

●●

●●●

●●

●●●

●● ●●●

●●

●●●

●● ●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●●

●●● ●

●●

●● ●

●●

●●

●● ●

●● ● ●●

●●

●●●●

●●

●●

●●●

● ● ●●

●●

●●●●

● ●●

●●●

●●●

●●

●●

●●

● ●●●●●●

●●●

●●● ●

●●

●●

●●

●●●

●●

●●

● ●●

●●

● ● ●●●●

●●●

●● ●

●●●●

●●

● ●●

●●

●●

●●

●●

●●●●

●● ● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●● ●● ●

●●

●●

● ●●

●●

●●

● ●

●●

●●

●●

●●●●●

●●●

●●●

●● ●

●●

●●

●●

●●

●●

●● ●

●●●

●●

●●

●●

●●

●●●

●●

●● ●

●● ●●●

●●

●●

●●

●● ●

● ●●

−3 −2 −1 0 1 2 3

−15

−10

−5

05

1015

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●

●●

−3 −2 −1 0 1 2 3

−15

−10

−5

05

1015

rnorm(1000)

x

22 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

OverviewExamples

Box Plots

> boxplot(Sepal.Length ~ Species, data=iris)> # More on the formula interface later!

setosa versicolor virginica

4.5

5.0

5.5

6.0

6.5

7.0

7.5

8.0

23 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

OverviewExamples

Mosaic Plots

> mosaicplot(~ Sex + Age + Survived,+ data = Titanic, shade = TRUE)

Sta

ndar

dize

dR

esid

uals

:<

−4

−4:

−2

−2:

00:

22:

4>

4

Titanic

Sex

Age

Male Female

Chi

ldA

dult

No Yes No Yes

24 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

OverviewExamples

Maps

> library(mapdata)> map("worldHires",+ projection = "azequalarea",+ orientation = c(43, -81.25, 0),+ col = sample(terrain.colors(100)),+ fill =TRUE)

The maps package is by Ray Brownrigg, based on S code by Rick Becker and Allan Wilks.

25 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

OverviewExamples

26 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

OverviewExamples

Correlation Plots—plotcorr() in ellipse package

wt

cyl

disp

hp

carbqsec

gear

am

vs

dratmpg

wt

cyl

disp

hp carb

qsec

gear

am vs drat

mpg

27 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

1 Statistical Graphics

2 About RR HistoryWhat is R today?

3 Graphics in ROverviewExamples

4 The rgl package

28 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

The rgl package

Daniel Adler and I wrote this over the last few years. It’s awrapper for the OpenGL graphics rendering library:

Can display solid 3D objects, with real-time rotation,hidden surface removal, etc.High level functions corresponding to classic S 2Dgraphics: plot3d(), etc.Low level functions for flashy computer graphics effects.

29 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

> x <- sort(rnorm(1000)); y <- rnorm(1000)> z <- rnorm(1000) + atan2(x,y)> plot3d(x,y,z, col=rainbow(1000), type="s")

30 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

Now some demos...

31 of 32

Statistical GraphicsAbout R

Graphics in RThe rgl package

Summary

Graphical displays often give new insights into data.R provides a very flexible and rich system for producinggraphics.

32 of 32

Recommended