Upload
zezonsilva
View
58
Download
0
Embed Size (px)
Citation preview
February 2012
Hadley WickhamAssistant Professor / Dobelman Family Junior ChairDepartment of StatisticsRice University
ggplot2: A backstage tour
Wednesday, February 8, 12
1. Why ggplot2?
2. Sneak peek and new features
3. Best practices
4. Questions
Wednesday, February 8, 12
Poll: What graphics system are you currently using?
Wednesday, February 8, 12
Why ggplot2?
Wednesday, February 8, 12
WHC
day
whc
−0.3
−0.2
−0.1
0.0
0.1
0.2
20 40 60 80
02H02M12H
●
●
●
●
2004
Wednesday, February 8, 12
Wednesday, February 8, 12
“Nothing is as practical as a good theory”—Kurt Lewin
“[A good model] will bring together in a coherent way things that previously appeared unrelated and which also will provide a basis for dealing systematically with new situations”—David Cox
Wednesday, February 8, 12
A plot is made up of multiple layers.A layer consists of data, a set of mappings between variables and aesthetics, a geometric object and a statistical transformationScales control the details of the mapping.All components are independent and reusable.
Wednesday, February 8, 12
Interesting ggplot exampleLayered grammar + ggplot2
James Cheshire, http://bit.ly/xqHhAsWednesday, February 8, 12
Charlotte Wickham, http://cwick.co.nz/Wednesday, February 8, 12
David B Sparks, http://bit.ly/hn54NWWednesday, February 8, 12
Claudia Beleites, http://bit.ly/yNqlpzWednesday, February 8, 12
Poll: What resources are most helpful to you when improving your R skills?
Wednesday, February 8, 12
Learning ggplot2ggplot2 mailing listhttp://groups.google.com/group/ggplot2stackoverflowhttp://stackoverflow.com/tags/ggplot2Lattice to ggplot2 conversionhttp://learnr.wordpress.com/?s=latticeCookbook for common graphicshttp://wiki.stdout.org/rcookbook/Graphs/ggplot2 bookhttp://amzn.com/0387981403
Wednesday, February 8, 12
Sneak peek
Wednesday, February 8, 12
Poll: Why do you use visualisation?
Wednesday, February 8, 12
# Getting started
# To get the CRAN versioninstall.packages("ggplot2")
# To get the development versioninstall.packages("devtools")library(devtools)dev_mode() # don't overwrite your existing installinstall_github("ggplot2")
Wednesday, February 8, 12
Development version
CRAN version
Wednesday, February 8, 12
15
20
25
30
35
40
45
●●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
● ●
●
●
●
●
●
●●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●●
●
●
●●●
2seater compact midsize minivan pickup subcompact suvclass
hwy
New geoms to deal with overplotting(by Winston Chang)
Wednesday, February 8, 12
15
20
25
30
35
40
45
●●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
● ●
●
●
●
●
●
●●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●●
●
●
●●●
2seater compact midsize minivan pickup subcompact suvclass
hwy
qplot(class, hwy, data = mpg)
New geoms to deal with overplotting(by Winston Chang)
Wednesday, February 8, 12
15
20
25
30
35
40
45
●●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
● ●
●●
●
●
●
● ●
●
●
●●
● ●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●●
●●
●●
●
●
● ●●
●
● ●
●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●
●
●
●
●●
● ●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●●●
●●
●●
●
2seater compact midsize minivan pickup subcompact suvclass
hwy
qplot(class, hwy, data = mpg, geom = "jitter")Wednesday, February 8, 12
15
20
25
30
35
40
45
2seater compact midsize minivan pickup subcompact suvclass
hwy
qplot(class, hwy, data = mpg, geom = "violin")Wednesday, February 8, 12
10
15
20
25
30
35
40
45
●●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●
●
●●
●
●
●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●●●●●●
●
●
●●●●●●
●●●●
●●●
●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●
●
●●●●●
●●●●●
●●●●●●●●●●●
●●●●●
●●●●●
●●
●
●
●●
●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●
2seater compact midsize minivan pickup subcompact suvclass
hwy
Wednesday, February 8, 12
10
15
20
25
30
35
40
45
●●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●
●
●●
●
●
●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●●●●●●
●
●
●●●●●●
●●●●
●●●
●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●
●
●●●●●
●●●●●
●●●●●●●●●●●
●●●●●
●●●●●
●●
●
●
●●
●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●
2seater compact midsize minivan pickup subcompact suvclass
hwy
qplot(class, hwy, data = mpg, geom = "dotplot", stackdir = "center", binaxis = "y", stackratio = 1, binwidth = 1)Wednesday, February 8, 12
0.0
0.2
0.4
0.6
0.8
1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0x
y
colour●
●
●
●
●
0.51.01.52.02.5
Better legends(by Kohske Takahashi)
Wednesday, February 8, 12
0.0
0.2
0.4
0.6
0.8
1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0x
y
colour●
●
●
●
●
0.51.01.52.02.5
df <- data.frame(x = runif(100), y = runif(100))df$colour <- with(df, x ^ 2 + y + runif(100))
qplot(x, y, data = df, colour = colour)
Better legends(by Kohske Takahashi)
Wednesday, February 8, 12
0.0
0.2
0.4
0.6
0.8
1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0x
y
colour● ● ●
● ●
0.5 1.0 1.5
2.0 2.5
qplot(x, y, data = df, colour = colour) + guides(colour = guide_legend(nrow = 2, byrow = T)) Wednesday, February 8, 12
0.0
0.2
0.4
0.6
0.8
1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0x
y
0.51.01.52.02.5
colour
qplot(x, y, data = df, colour = colour) + guides(colour = guide_colorbar())Wednesday, February 8, 12
qplot(x, y, data = df, colour = colour, alpha = I(1/4))Wednesday, February 8, 12
qplot(x, y, data = df, colour = colour, alpha = I(1/4)) + guides(colour = guide_legend( override.aes = list(alpha = 1, size = 2)))Wednesday, February 8, 12
# Better layout
df <- data.frame(x = 1:10, y = 10:1, colour = 1:2)qplot(x, y, data = df) + coord_fixed()qplot(x, y, data = df) + facet_wrap(~ colour)
# Internally, there has been a big rewrite of # the facetting data processing and rendering# systems. This lays the foundation for new # features, and fixes some annoying long-standing # bugs.
Wednesday, February 8, 12
# Speed improvements
system.time( print(qplot(carat, price, data = diamonds)))
# Includes new tools for figuring out what's# taking all the timebenchplot(qplot(carat, price, data = diamonds))
# See also geom_raster and geom_map
# Still a lot of work to do. The emphasis in# ggplot2 is reducing the amount of thinking # time by making it easier to go from the plot in# your brain to the plot on the page.
Wednesday, February 8, 12
30s with geom_tile, 8s with annotation_rasterWednesday, February 8, 12
library(ggplot2)library(reshape2)library(RgoogleMaps)library(ggmap)
theft <- subset(crime, offense == "theft" & lat > 29 & lat < 30.2 & lon > -95.8)
lonr <- range(theft$lon)latr <- range(theft$lat)
h_map <- GetMap.bbox(lonr, latr, size = c(1024, 1024))h_raster <- as.raster(h_map$myTile)
benchplot(ggplot(theft, aes(lon, lat)) + annotation_raster(h_raster, lonr[1], lonr[2], latr[1], latr[2]) + geom_density2d(colour = "black"))
h_data <- melt(as.matrix(h_raster))h_data$lat <- seq(latr[2], latr[1], length = 640)[h_data$Var1]h_data$lon <- seq(lonr[1], lonr[2], length = 640)[h_data$Var2]
benchplot(ggplot(theft, aes(lon, lat)) + geom_tile(aes(fill = value), data = h_data) + scale_fill_identity() + geom_density2d(colour = "black"))
Wednesday, February 8, 12
ggplot2 0.9 scheduled for release on March 1
Wednesday, February 8, 12
Poll: How big is your data?
Wednesday, February 8, 12
# Future work: big visualisation# (Sponsored by Revolution Analytics)
# How can you make a plot of 100 million # observations?
# In less that one minute.
Wednesday, February 8, 12
Wednesday, February 8, 12
Wednesday, February 8, 12
Wednesday, February 8, 12
Wednesday, February 8, 12
Wednesday, February 8, 12
~100,000 points0.06 s to bin0.20 s to convert6.0 s to plot
Wednesday, February 8, 12
~100,000 points0.06 s to bin0.20 s to convert6.0 s to plot
~1.2 million10 s to bin
Wednesday, February 8, 12
Best practices
Wednesday, February 8, 12
Poll: How do you learn about new packages?
Wednesday, February 8, 12
Package best practices
• Namespace• Documentation• Unit tests• Read the source!
• (ggplot2 not always the best example: it was was my second R package - I have now written around 30. I now know a lot more!)
Wednesday, February 8, 12
Wednesday, February 8, 12
# Namespaces
library(ggplot2)ddply
# Note that plyr, reshape etc aren't automatically# loaded. This is good development practice - # it's better to be explicit than implicit.
# Look at the NAMESPACE file.
Wednesday, February 8, 12
export("%+%")export(aes_all)export(aes_auto)export(aes_string)export(aes)export(annotate)export(annotation_custom)export(annotation_map)export(annotation_raster)export(autoplot)export(benchplot)export(borders)export(continuous_scale)export(coord_cartesian)export(coord_equal)export(coord_fixed)export(coord_flip)export(coord_map)export(coord_polar)...
Wednesday, February 8, 12
# Unit tests
# Look in tests/ or inst/tests/
library(testthat)test_package("ggplot2")
Wednesday, February 8, 12
# Documentation
# Function level in man/?geom_point?facet_wrappackage?ggplot2
# Vignettes in inst/doc# (ggplot2 doesn't have any)
# Publicationscitation("ggplot2")
Wednesday, February 8, 12
Questions
Wednesday, February 8, 12
Learning ggplot2ggplot2 mailing listhttp://groups.google.com/group/ggplot2stackoverflowhttp://stackoverflow.com/tags/ggplot2Lattice to ggplot2 conversionhttp://learnr.wordpress.com/?s=latticeCookbook for common graphicshttp://wiki.stdout.org/rcookbook/Graphs/ggplot2 bookhttp://amzn.com/0387981403
Wednesday, February 8, 12