43
Introduction to ggplot2 Elegant Graphics for Data Analysis Maik Röder 15.12.2011 RUGBCN and Barcelona Code Meetup 1 vendredi 16 décembre 2011

Introduction to ggplot2

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Introduction to ggplot2

Introduction to ggplot2Elegant Graphics for Data Analysis

Maik Röder15.12.2011

RUGBCN and Barcelona Code Meetup

1vendredi 16 décembre 2011

Page 2: Introduction to ggplot2

Data Analysis Steps• Prepare data

• e.g. using the reshape framework for restructuring data

• Plot data

• e.g. using ggplot2 instead of base graphics and lattice

• Summarize the data and refine the plots

• Iterative process

2vendredi 16 décembre 2011

Page 3: Introduction to ggplot2

ggplot2grammar of graphics

3vendredi 16 décembre 2011

Page 4: Introduction to ggplot2

Grammar

• Oxford English Dictionary:

• The fundamental principles or rules of an art or science

• A book presenting these in methodical form. (Now rare; formerly common in the titles of books.)

• System of rules underlying a given language

• An abstraction which facilitates thinking, reasoning and communicating

4vendredi 16 décembre 2011

Page 5: Introduction to ggplot2

The grammar of graphics

• Move beyond named graphics (e.g. “scatterplot”)

• gain insight into the deep structure that underlies statistical graphics

• Powerful and flexible system for

• constructing abstract graphs (set of points) mathematically

• Realizing physical representations as graphics by mapping aesthetic attributes (size, colour) to graphs

• Lacking openly available implementation

5vendredi 16 décembre 2011

Page 6: Introduction to ggplot2

Specification

• DATA - data operations that create variables from datasets. Reshaping using an Algebra with operations

• TRANS - variable transformations

• SCALE - scale transformations

• ELEMENT - graphs and their aesthetic attributes

• COORD - a coordinate system

• GUIDE - one or more guides

Concise description of components of a graphic

6vendredi 16 décembre 2011

Page 7: Introduction to ggplot2

Birth/Death Rate

Source: http://www.scalloway.org.uk/popu6.htm

7vendredi 16 décembre 2011

Page 8: Introduction to ggplot2

Excess birth (vs. death) rates in selected countries

Source: The grammar of Graphics, p.138vendredi 16 décembre 2011

Page 9: Introduction to ggplot2

Grammar of Graphics

DATA: source("demographics")DATA: longitude, latitude = map(source("World"))TRANS: bd = max(birth - death, 0)COORD: project.mercator()ELEMENT: point(position(lon * lat), size(bd), color(color.red))ELEMENT: polygon(position(longitude * latitude))

Source: The grammar of Graphics, p.13

Specification can be run in GPL implemented in SPSS

9vendredi 16 décembre 2011

Page 10: Introduction to ggplot2

Grammar of Graphics

DataTrans

Element

ScaleGuide

Coord

Layered Grammar of Graphics Defaults

DataMapping

LayerDataMappingGeomStatPosition

ScaleCoordFacet

Rearrangement of Components

10vendredi 16 décembre 2011

Page 11: Introduction to ggplot2

Layered Grammar of Graphics

w <- worldd <- demographicsd <- transform(d, bd = pmax(birth - death, 0))p <- ggplot(d, aes(lon, lat)) p <- p + geom_polygon(data = w)p <- p + geom_point(aes(size = bd), colour = "red")p <- p + coord_map(projection = "mercator")p

Implementation embedded in R using ggplot2

11vendredi 16 décembre 2011

Page 12: Introduction to ggplot2

ggplot2

• Author: Hadley Wickham

• Open Source implementation of the layered grammar of graphics

• High-level R package for creating publication-quality statistical graphics

• Carefully chosen defaults following basic graphical design rules

• Flexible set of components for creating any type of graphics

12vendredi 16 décembre 2011

Page 13: Introduction to ggplot2

ggplot2 installation

• In R console:

install.packages("ggplot2")library(ggplot2)

13vendredi 16 décembre 2011

Page 14: Introduction to ggplot2

qplot

• Quickly plot something with qplot

• for exploring ideas interactively

• Same options as plot converted to ggplot2

qplot(carat, price, data=diamonds, main = "Diamonds", asp = 1)

14vendredi 16 décembre 2011

Page 15: Introduction to ggplot2

15vendredi 16 décembre 2011

Page 16: Introduction to ggplot2

Exploring with qplot

qplot(log(carat), log(price), data=diamonds)

qplot(carat, price, data=diamonds)

First try:

Log transform using functions on the variables:

16vendredi 16 décembre 2011

Page 17: Introduction to ggplot2

17vendredi 16 décembre 2011

Page 18: Introduction to ggplot2

from qplot to ggplot

qplot(carat, price, data=diamonds, main = "Diamonds", asp = 1)

p <- ggplot(diamonds, aes(carat, price)) p <- p + geom_point()p <- p + opts(title = "Diamonds", aspect.ratio = 1)p

18vendredi 16 décembre 2011

Page 19: Introduction to ggplot2

Data and mapping

• If you need to flexibly restructure and aggregate data beforehand, use Reshape

• data is considered an independent concern

• Need a mapping of what variables are mapped to what aesthetic

• weight => x, height => y, age => size

• Mappings are defined in scales

19vendredi 16 décembre 2011

Page 20: Introduction to ggplot2

Statistical Transformations

• a stat transforms data

• can add new variables to a dataset

• that can be used in aesthetic mappings

20vendredi 16 décembre 2011

Page 21: Introduction to ggplot2

stat_smooth

• Fits a smoother to the data

• Displays a smooth and its standard error

ggplot(diamonds, aes(carat, price)) + geom_point() + geom_smooth()

21vendredi 16 décembre 2011

Page 22: Introduction to ggplot2

22vendredi 16 décembre 2011

Page 23: Introduction to ggplot2

Geometric Object

• Control the type of plot

• A geom can only display certain aesthetics

23vendredi 16 décembre 2011

Page 24: Introduction to ggplot2

geom_histogram

ggplot(diamonds, aes(carat)) + geom_histogram()

• Distribution of carats shown in a histogram

24vendredi 16 décembre 2011

Page 25: Introduction to ggplot2

25vendredi 16 décembre 2011

Page 26: Introduction to ggplot2

Position adjustments

• Tweak positioning of geometric objects

• Avoid overlaps

26vendredi 16 décembre 2011

Page 27: Introduction to ggplot2

position_jitter

x <- c(0, 0, 0, 0, 0)y <- c(0, 0, 0, 0, 0)overplotted <- data.frame(x, y)ggplot(overplotted, aes(x,y)) + geom_point(position=position_jitter(w=0.1, h=0.1))

• Avoid overplotting by jittering points

27vendredi 16 décembre 2011

Page 28: Introduction to ggplot2

28vendredi 16 décembre 2011

Page 29: Introduction to ggplot2

Scales

• Control mapping from data to aesthetic attributes

• One scale per aesthetic

29vendredi 16 décembre 2011

Page 30: Introduction to ggplot2

scale_x_continuousscale_y_continuous

x <- c(0, 0, 0, 0, 0)y <- c(0, 0, 0, 0, 0)overplotted <- data.frame(x, y)ggplot(overplotted, aes(x,y)) + geom_point(position=position_jitter(w=0.1, h=0.1)) + scale_x_continuous(limits=c(-1,1)) + scale_y_continuous(limits=c(-1,1))

30vendredi 16 décembre 2011

Page 31: Introduction to ggplot2

31vendredi 16 décembre 2011

Page 32: Introduction to ggplot2

Coordinate System

• Maps the position of objects into the plane

• Affect all position variables simultaneously

• Change appearance of geoms (unlike scales)

32vendredi 16 décembre 2011

Page 33: Introduction to ggplot2

coord_maplibrary("maps")

map <- map("nz", plot=FALSE)[c("x","y")]

m <- data.frame(map)

n <- qplot(x, y, data=m, geom="path")

n

d <- data.frame(c(0), c(0))

n + geom_point(data = d, colour = "red")

33vendredi 16 décembre 2011

Page 34: Introduction to ggplot2

34vendredi 16 décembre 2011

Page 35: Introduction to ggplot2

Faceting• lay out multiple plots on a page

• split data into subsets

• plot subsets into different panels

35vendredi 16 décembre 2011

Page 36: Introduction to ggplot2

Facet Types2D grid of panels: 1D ribbon of panels

wrapped into 2D:

36vendredi 16 décembre 2011

Page 37: Introduction to ggplot2

Faceting

aesthetics <- aes(carat, ..density..)p <- ggplot(diamonds, aesthetics)p <- p + geom_histogram(binwidth = 0.2) p + facet_grid(clarity ~ cut)

37vendredi 16 décembre 2011

Page 38: Introduction to ggplot2

38vendredi 16 décembre 2011

Page 39: Introduction to ggplot2

Faceting Formula

no faceting . ~ .

single row multiple columns . ~ a

single column, multiple rows b ~ .

multiple rows and columns a ~ b

multiple variables in rows and/or columns

. ~ a + ba + b ~.

a + b ~ c + d

39vendredi 16 décembre 2011

Page 40: Introduction to ggplot2

Scales in Facets

scales value free

fixed -

free x, y

free_x x

free_y y

facet_grid(. ~ cyl, scales="free_x")

40vendredi 16 décembre 2011

Page 41: Introduction to ggplot2

Layers

• Iterativey update a plot

• change a single feature at a time

• Think about the high level aspects of the plot in isolation

• Instead of choosing a static type of plot, create new types of plots on the fly

• Cure against immobility

• Developers can easily develop new layers without affecting other layers

41vendredi 16 décembre 2011

Page 42: Introduction to ggplot2

Hierarchy of defaults

Omitted layer Default chosen by layer

Stat Geom

Geom Stat

Mapping Plot default

Coord Cartesian coordinates

Scale Chosen depending on aesthetic and type of variable

PositionLinear scaling for continuous variables

Integers for categorical variables

42vendredi 16 décembre 2011

Page 43: Introduction to ggplot2

Thanks!

• Visit the ggplot2 homepage:

• http://had.co.nz/ggplot2/

• Get the ggplot2 book:

• http://amzn.com/0387981403

• Get the Grammar of Graphics book from Leland Wilkinson:

• http://amzn.com/0387245448

43vendredi 16 décembre 2011