54
February 2012 Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics Rice University ggplot2: A backstage tour Wednesday, February 8, 12

Ggplot2 Intro

Embed Size (px)

Citation preview

Page 2: Ggplot2 Intro

1. Why ggplot2?

2. Sneak peek and new features

3. Best practices

4. Questions

Wednesday, February 8, 12

Page 3: Ggplot2 Intro

Poll: What graphics system are you currently using?

Wednesday, February 8, 12

Page 4: Ggplot2 Intro

Why ggplot2?

Wednesday, February 8, 12

Page 5: Ggplot2 Intro

WHC

day

whc

−0.3

−0.2

−0.1

0.0

0.1

0.2

20 40 60 80

02H02M12H

2004

Wednesday, February 8, 12

Page 6: Ggplot2 Intro

Wednesday, February 8, 12

Page 7: Ggplot2 Intro

“Nothing is as practical as a good theory”—Kurt Lewin

“[A good model] will bring together in a coherent way things that previously appeared unrelated and which also will provide a basis for dealing systematically with new situations”—David Cox

Wednesday, February 8, 12

Page 8: Ggplot2 Intro

A plot is made up of multiple layers.A layer consists of data, a set of mappings between variables and aesthetics, a geometric object and a statistical transformationScales control the details of the mapping.All components are independent and reusable.

Wednesday, February 8, 12

Page 9: Ggplot2 Intro

Interesting ggplot exampleLayered grammar + ggplot2

James Cheshire, http://bit.ly/xqHhAsWednesday, February 8, 12

Page 10: Ggplot2 Intro

Charlotte Wickham, http://cwick.co.nz/Wednesday, February 8, 12

Page 11: Ggplot2 Intro

David B Sparks, http://bit.ly/hn54NWWednesday, February 8, 12

Page 12: Ggplot2 Intro

Claudia Beleites, http://bit.ly/yNqlpzWednesday, February 8, 12

Page 13: Ggplot2 Intro

Poll: What resources are most helpful to you when improving your R skills?

Wednesday, February 8, 12

Page 14: Ggplot2 Intro

Learning ggplot2ggplot2 mailing listhttp://groups.google.com/group/ggplot2stackoverflowhttp://stackoverflow.com/tags/ggplot2Lattice to ggplot2 conversionhttp://learnr.wordpress.com/?s=latticeCookbook for common graphicshttp://wiki.stdout.org/rcookbook/Graphs/ggplot2 bookhttp://amzn.com/0387981403

Wednesday, February 8, 12

Page 15: Ggplot2 Intro

Sneak peek

Wednesday, February 8, 12

Page 16: Ggplot2 Intro

Poll: Why do you use visualisation?

Wednesday, February 8, 12

Page 17: Ggplot2 Intro

# Getting started

# To get the CRAN versioninstall.packages("ggplot2")

# To get the development versioninstall.packages("devtools")library(devtools)dev_mode() # don't overwrite your existing installinstall_github("ggplot2")

Wednesday, February 8, 12

Page 18: Ggplot2 Intro

Development version

CRAN version

Wednesday, February 8, 12

Page 19: Ggplot2 Intro

15

20

25

30

35

40

45

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

2seater compact midsize minivan pickup subcompact suvclass

hwy

New geoms to deal with overplotting(by Winston Chang)

Wednesday, February 8, 12

Page 20: Ggplot2 Intro

15

20

25

30

35

40

45

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

2seater compact midsize minivan pickup subcompact suvclass

hwy

qplot(class, hwy, data = mpg)

New geoms to deal with overplotting(by Winston Chang)

Wednesday, February 8, 12

Page 21: Ggplot2 Intro

15

20

25

30

35

40

45

●●

●●

●●●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●● ●●

●●●

●●

●●

2seater compact midsize minivan pickup subcompact suvclass

hwy

qplot(class, hwy, data = mpg, geom = "jitter")Wednesday, February 8, 12

Page 22: Ggplot2 Intro

15

20

25

30

35

40

45

2seater compact midsize minivan pickup subcompact suvclass

hwy

qplot(class, hwy, data = mpg, geom = "violin")Wednesday, February 8, 12

Page 23: Ggplot2 Intro

10

15

20

25

30

35

40

45

●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●

●●

●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●

●●●●●●

●●●●

●●●

●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●

●●●●●

●●●●●

●●●●●●●●●●●

●●●●●

●●●●●

●●

●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

2seater compact midsize minivan pickup subcompact suvclass

hwy

Wednesday, February 8, 12

Page 24: Ggplot2 Intro

10

15

20

25

30

35

40

45

●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●

●●

●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●

●●●●●●

●●●●

●●●

●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●

●●●●●

●●●●●

●●●●●●●●●●●

●●●●●

●●●●●

●●

●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

2seater compact midsize minivan pickup subcompact suvclass

hwy

qplot(class, hwy, data = mpg, geom = "dotplot", stackdir = "center", binaxis = "y", stackratio = 1, binwidth = 1)Wednesday, February 8, 12

Page 25: Ggplot2 Intro

0.0

0.2

0.4

0.6

0.8

1.0

● ●

●●

●●

● ●

0.0 0.2 0.4 0.6 0.8 1.0x

y

colour●

0.51.01.52.02.5

Better legends(by Kohske Takahashi)

Wednesday, February 8, 12

Page 26: Ggplot2 Intro

0.0

0.2

0.4

0.6

0.8

1.0

● ●

●●

●●

● ●

0.0 0.2 0.4 0.6 0.8 1.0x

y

colour●

0.51.01.52.02.5

df <- data.frame(x = runif(100), y = runif(100))df$colour <- with(df, x ^ 2 + y + runif(100))

qplot(x, y, data = df, colour = colour)

Better legends(by Kohske Takahashi)

Wednesday, February 8, 12

Page 27: Ggplot2 Intro

0.0

0.2

0.4

0.6

0.8

1.0

●●

●●

●●

● ●

0.0 0.2 0.4 0.6 0.8 1.0x

y

colour● ● ●

● ●

0.5 1.0 1.5

2.0 2.5

qplot(x, y, data = df, colour = colour) + guides(colour = guide_legend(nrow = 2, byrow = T)) Wednesday, February 8, 12

Page 28: Ggplot2 Intro

0.0

0.2

0.4

0.6

0.8

1.0

● ●

●●

●●

● ●

0.0 0.2 0.4 0.6 0.8 1.0x

y

0.51.01.52.02.5

colour

qplot(x, y, data = df, colour = colour) + guides(colour = guide_colorbar())Wednesday, February 8, 12

Page 29: Ggplot2 Intro

qplot(x, y, data = df, colour = colour, alpha = I(1/4))Wednesday, February 8, 12

Page 30: Ggplot2 Intro

qplot(x, y, data = df, colour = colour, alpha = I(1/4)) + guides(colour = guide_legend( override.aes = list(alpha = 1, size = 2)))Wednesday, February 8, 12

Page 31: Ggplot2 Intro

# Better layout

df <- data.frame(x = 1:10, y = 10:1, colour = 1:2)qplot(x, y, data = df) + coord_fixed()qplot(x, y, data = df) + facet_wrap(~ colour)

# Internally, there has been a big rewrite of # the facetting data processing and rendering# systems. This lays the foundation for new # features, and fixes some annoying long-standing # bugs.

Wednesday, February 8, 12

Page 32: Ggplot2 Intro

# Speed improvements

system.time( print(qplot(carat, price, data = diamonds)))

# Includes new tools for figuring out what's# taking all the timebenchplot(qplot(carat, price, data = diamonds))

# See also geom_raster and geom_map

# Still a lot of work to do. The emphasis in# ggplot2 is reducing the amount of thinking # time by making it easier to go from the plot in# your brain to the plot on the page.

Wednesday, February 8, 12

Page 33: Ggplot2 Intro

30s with geom_tile, 8s with annotation_rasterWednesday, February 8, 12

Page 34: Ggplot2 Intro

library(ggplot2)library(reshape2)library(RgoogleMaps)library(ggmap)

theft <- subset(crime, offense == "theft" & lat > 29 & lat < 30.2 & lon > -95.8)

lonr <- range(theft$lon)latr <- range(theft$lat)

h_map <- GetMap.bbox(lonr, latr, size = c(1024, 1024))h_raster <- as.raster(h_map$myTile)

benchplot(ggplot(theft, aes(lon, lat)) + annotation_raster(h_raster, lonr[1], lonr[2], latr[1], latr[2]) + geom_density2d(colour = "black"))

h_data <- melt(as.matrix(h_raster))h_data$lat <- seq(latr[2], latr[1], length = 640)[h_data$Var1]h_data$lon <- seq(lonr[1], lonr[2], length = 640)[h_data$Var2]

benchplot(ggplot(theft, aes(lon, lat)) + geom_tile(aes(fill = value), data = h_data) + scale_fill_identity() + geom_density2d(colour = "black"))

Wednesday, February 8, 12

Page 35: Ggplot2 Intro

ggplot2 0.9 scheduled for release on March 1

Wednesday, February 8, 12

Page 36: Ggplot2 Intro

Poll: How big is your data?

Wednesday, February 8, 12

Page 37: Ggplot2 Intro

# Future work: big visualisation# (Sponsored by Revolution Analytics)

# How can you make a plot of 100 million # observations?

# In less that one minute.

Wednesday, February 8, 12

Page 38: Ggplot2 Intro

Wednesday, February 8, 12

Page 39: Ggplot2 Intro

Wednesday, February 8, 12

Page 40: Ggplot2 Intro

Wednesday, February 8, 12

Page 41: Ggplot2 Intro

Wednesday, February 8, 12

Page 42: Ggplot2 Intro

Wednesday, February 8, 12

Page 43: Ggplot2 Intro

~100,000 points0.06 s to bin0.20 s to convert6.0 s to plot

Wednesday, February 8, 12

Page 44: Ggplot2 Intro

~100,000 points0.06 s to bin0.20 s to convert6.0 s to plot

~1.2 million10 s to bin

Wednesday, February 8, 12

Page 45: Ggplot2 Intro

Best practices

Wednesday, February 8, 12

Page 46: Ggplot2 Intro

Poll: How do you learn about new packages?

Wednesday, February 8, 12

Page 47: Ggplot2 Intro

Package best practices

• Namespace• Documentation• Unit tests• Read the source!

• (ggplot2 not always the best example: it was was my second R package - I have now written around 30. I now know a lot more!)

Wednesday, February 8, 12

Page 48: Ggplot2 Intro

Wednesday, February 8, 12

Page 49: Ggplot2 Intro

# Namespaces

library(ggplot2)ddply

# Note that plyr, reshape etc aren't automatically# loaded. This is good development practice - # it's better to be explicit than implicit.

# Look at the NAMESPACE file.

Wednesday, February 8, 12

Page 50: Ggplot2 Intro

export("%+%")export(aes_all)export(aes_auto)export(aes_string)export(aes)export(annotate)export(annotation_custom)export(annotation_map)export(annotation_raster)export(autoplot)export(benchplot)export(borders)export(continuous_scale)export(coord_cartesian)export(coord_equal)export(coord_fixed)export(coord_flip)export(coord_map)export(coord_polar)...

Wednesday, February 8, 12

Page 51: Ggplot2 Intro

# Unit tests

# Look in tests/ or inst/tests/

library(testthat)test_package("ggplot2")

Wednesday, February 8, 12

Page 52: Ggplot2 Intro

# Documentation

# Function level in man/?geom_point?facet_wrappackage?ggplot2

# Vignettes in inst/doc# (ggplot2 doesn't have any)

# Publicationscitation("ggplot2")

Wednesday, February 8, 12

Page 53: Ggplot2 Intro

Questions

Wednesday, February 8, 12

Page 54: Ggplot2 Intro

Learning ggplot2ggplot2 mailing listhttp://groups.google.com/group/ggplot2stackoverflowhttp://stackoverflow.com/tags/ggplot2Lattice to ggplot2 conversionhttp://learnr.wordpress.com/?s=latticeCookbook for common graphicshttp://wiki.stdout.org/rcookbook/Graphs/ggplot2 bookhttp://amzn.com/0387981403

Wednesday, February 8, 12