An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for...

Preview:

Citation preview

An introduction to ggplot2

Claude Renaux

Seminar for Statistics, ETH Zurich

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 1 / 33

Table of Contents

1 Overview

2 Selection of geoms

3 Multiple Layers and Faceting

4 Aesthetics

5 Tailoring Graphics

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 2 / 33

ggplot2: plots of one variable

In this section we will . . .

. . . get started with ggplot2

. . . look at plots of one variable

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 3 / 33

Must have

Useful cheatsheet: https://www.rstudio.com/resources/cheatsheets/ (pickData Visualisation with ggplot2)

Source: link above. This image is under Creative Commons licence.

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 4 / 33

Why ggplot2?

To mention some advantages:

nice labels

nice colors

small margins

beautiful faceting or multipanel plots

very powerful and flexible: we’ll have a glimpse at the grammar ofgraphics

we can change or update plots

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 5 / 33

Why ggplot2?

To mention some disadvantages:

ggplot2 can only deal with data.frames

default plots of model outputs are normally not possible

ggplot2 is not optimized for the best speed performance

3-D plots are not possible

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 6 / 33

Overview: plots of one variable

One continuous variable

histogram: geom_histogram()

densities: geom_density()

frequency plot: geom_freqpoly()

One discrete variable

barplot: geom_bar()

pie plot: different coordinate system of barplot...

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 7 / 33

Package: ggplot2

There are two important functions.

qplot: similar to base plotting functions

ggplot: look at this function closer

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 8 / 33

Let’s have a look — mpg data set

Let’s look at the mpg data set from ggplot2. It contains 234 observationsabout the fuel efficiency of 38 popular cars in 1999 and 2008.> ?mpg

> str(mpg)

Let’s look at the democode.

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 9 / 33

ggplot2: plots of two variables

In this section we will . . .

. . . glimpse at grammar of graphics underlying ggplot2

. . . create plots with two variables

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 10 / 33

Glimpse at grammar of graphics

The ”gg” in ggplot2 stands for grammar of graphics which is based onWilkinson’s (2005) grammar of graphics.

The layered grammar is useful because . . .

it is a generic way of creating a plot

do not relay on specific or customized graphic for a particular problem

iteratively update or create a plot

add layers

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 11 / 33

Glimpse at grammar of graphics

Idea: all the plots can be build from the same components

data set

geoms (geometric object) — visually representing observations

aes (aesthetics) — aesthetics properties of the geometric objects

scale — defining how the data is plotted

coordinate system

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 12 / 33

Glimpse at grammar of graphics

Source: https://www.rstudio.com/resources/cheatsheets/ of the Data Visualization

Cheat Sheet. This image is under Creative Commons licence.

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 13 / 33

Glimpse at grammar of graphics

Source: https://www.rstudio.com/resources/cheatsheets/ of the Data Visualization

Cheat Sheet. This image is under Creative Commons licence.

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 14 / 33

Overview: plots of two variables

Two continuous variables

scatter plot: geom_point()

scatter plot using jitter: geom_jitter()

smoother: geom_smooth()

Discrete x and continuous y

boxplot: geom_boxplot()

bar plot: geom_bar(stat = "identity")

Continuous function like time series

line plot: geom_line()

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 15 / 33

Let’s have a look — mpg data set

Let’s look at the democode.

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 16 / 33

ggplot2: multiple layers and faceting

In this section we will . . .

. . . look at multiple layers

. . . consider faceting or multipanel conditioning plots

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 17 / 33

Multiple layers

Add multiple layers> # variable cty is the city miles per gallon

> ggplot(data = mpg, aes(x = hwy, y = cty)) +

+ geom_jitter() +

+ geom_smooth() +

+ geom_rug(sides = "bl", position = "jitter")

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●●

●●

● ●

●●

●●

●●

●●●

●●

●●●

●●

● ●●

●●

● ●●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

● ●●

● ●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●● ●●

● ●

●● ●

●● ●

●●

● ●●

●●

●●

10

20

30

20 30 40hwy

cty

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 18 / 33

Faceting

Let’s look a multi-panel plots> # subset of the mpg data set

> mpg.small <- subset(mpg, manufacturer %in%

+ c("ford", "land rover", "toyota",

+ "chevrolet", "honda", "volkswagen"))

> ggplot(data = mpg.small, aes(x = hwy, y = cty)) +

+ geom_jitter() +

+ facet_wrap( ~ manufacturer)

●●●

●●

●●●

● ●

● ●●

●●●

●●

● ●●●●●●● ●

●●●

●● ● ●●

●●

●●

●●●

●●●

●●

●●●

●● ●●

●●●

● ● ●●

●● ●

● ●

●●

● ●●●

●●

●●

●●●●

●●

● ● ●●

●●

●●

chevrolet ford honda

land rover toyota volkswagen

10

15

20

25

30

35

10

15

20

25

30

35

20 30 40 20 30 40 20 30 40hwy

cty

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 19 / 33

Let’s have a look at the aesthetics

In this section we will have a look at the aesthetics . . .

. . . size

. . . shape

. . . color

. . . and combine them

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 20 / 33

size

> ggplot(data = mpg, aes(x = hwy, y = cty, size = displ)) +

+ geom_jitter()

> # displ. engine displacement, in litres

● ●●

●●

●●

●●

●●

●●

●●●

● ●

● ●

●●●

●●

●●

●●

●●●●

● ●●

●●

●●

●●

●●●

● ●

● ●● ●

●●

●●●●●

●●●

●●

●●

● ● ●●●

●●

● ●

●●

●●●

●●

●●

●●

●●

● ●●●● ●●

●●●●

● ●

●●●

●●

● ●

●●

●●

●●●

● ●

●●●

●● ● ●

●●●

● ●●

●●●

●●

●●

● ●●●

●●

●●●

●●

● ●

10

15

20

25

30

35

20 30 40hwy

cty

displ●

●●●

2

3

4

5

6

7

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 21 / 33

shape

> ggplot(data = mpg, aes(x = hwy, y = cty, shape = factor(cyl))) +

+ geom_jitter(size = 3)

> # one can set a fixed size for all the points

●●●

●●

● ●

●●

●●

● ●

●●

●●

●● ●

● ●●

●●

●● ●●● ● ●●

●●

●●

10

15

20

25

30

35

20 30 40hwy

cty

factor(cyl)

● 4

5

6

8

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 22 / 33

color

> ggplot(data = mpg, aes(x = hwy, y = cty, color = factor(cyl))) +

+ geom_jitter(size = 3)

●●●

●●

●●

●●

●●

●●●

● ●

● ●●

●●

●●

●●

●●●●

●●

● ●●

●●

●●

●●

● ●

●●●

●●

●●●●●

●● ●

●●

●●

●● ●●●

●●

● ●

● ●

●●●● ●

●●

●●

●●

● ●

● ●● ●●

●●●●

● ●

●●●

●●

●●

●●

●●●

●●

●●●

●●●●

●●

● ●●

●● ●

● ●

●●●

●●●

● ●●

●●

10

20

30

10 20 30 40hwy

cty

factor(cyl)

4

5

6

8

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 23 / 33

Combine them

> ggplot(data = mpg, aes(x = hwy, y = cty, color = factor(cyl),

+ shape = factor(cyl), size = displ)) +

+ geom_jitter()

> # nice, there is only one combined legend for shape and color

●●

●●

● ●

●●

●●

●●

● ●

●●

●●●

● ●

●●

●●

●●● ● ●

● ●

●●

●●

10

15

20

25

30

35

20 30 40hwy

cty

displ●

●●●

2

3

4

5

6

7

factor(cyl)● 4

5

6

8

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 24 / 33

The logic . . .

In this section we will look at. . .

. . . where to put what?

. . . the order of the layers

. . . how to save a plot

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 25 / 33

Where to put what?

All the arguments specified in the function ggplot() are passed to allthe layers.

Holds unless specified otherwise in a layer.

Arguments specified in a layer effect only the corresponding layer.

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 26 / 33

Where to put what?

Basic plot to start with> ggplot(data = mpg, aes(x = hwy, y = cty)) + geom_jitter() + geom_smooth()

Color both points and smoothers per group of drv⇒ three smoothers are fitted> ggplot(data = mpg, aes(x = hwy, y = cty, color = drv)) + geom_jitter() +

+ geom_smooth()

●●●

● ●

●●

●●

●●●

● ●

●●

●●

●●●●

●●

●●

●●

●● ●●

●●

●●

●●

● ●

● ●●

●●●●

●●

●●●

●●

● ● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●●

●●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●● ● ●

●●●

●● ●

●●

●●

● ●●●

●●

●●

●●

●●

10

20

30

20 30 40hwy

cty

drv●

4

f

r

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 27 / 33

Where to put what?

Color the points per group of drv⇒ one smoother is fitted> ggplot(data = mpg, aes(x = hwy, y = cty)) + geom_jitter(aes(color = drv)) +

+ geom_smooth()

●●

● ●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●●

●●●●

●● ●

●●

●● ●●

● ●

● ●

●●●● ●

●●

●●

●●

●●

● ●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

● ●●● ●

●●●

●● ● ●

●●

● ● ●

●● ●

● ●

●●

● ●●

●●●

●●

●●

10

20

30

20 30 40hwy

cty

drv●

4

f

r

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 28 / 33

Where to put what?

Color the smoothers per groug drv

⇒ three smoothers are fitted> ggplot(data = mpg, aes(x = hwy, y = cty)) + geom_jitter() +

+ geom_smooth(aes(color = drv))

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●●

●●

● ● ●

●●

●●

● ● ●●

●●

●●

●●

● ●

●●

●●●

●●

●●●

● ●●

●●

● ●●●

● ●●●

●●●

●●

●●

●●●

●●

●●

● ●●

● ●

●●

●● ●●

●●

● ● ●

● ●●

●●

● ●●

●●

●●

●●

● ●

10

20

30

20 30 40hwy

cty

drv

4

f

r

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 29 / 33

Does the order of the layers matter?

Plot the points first as a layer and then add the layer with boxplots:> ggplot(data = mpg, aes(x = class, y = cty)) +

+ geom_jitter(alpha = 0.4) +

+ geom_boxplot(aes(fill = class))

Plot the boxplot first and afterwards add the layer of points:> ggplot(data = mpg, aes(x = class, y = cty)) +

+ geom_boxplot(aes(fill = class)) +

+ geom_jitter(alpha = 0.4)

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 30 / 33

How to save a plot?

The plot to be saved> v <- ggplot(data = mpg, aes(x = class, y = cty)) + geom_boxplot() +

+ geom_jitter(alpha = 0.3)

Plots which are saved in a object can be saved by ggsave().> ggsave(filename = "cool-boxplot-II.png", plot = v)

> # Cool, `ggsave()` recognizes the output format!

> # (pdf, png, jpg, eps, svg)

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 31 / 33

How to save a plot?

Control the width & height and change the path> # the save of the figure can be controlled by

> ggsave(filename = "cool-boxplot-III.jpg", plot = v, width = 5, height = 4,

+ path = "/path/of/figures/")

Alternatively don’t forget print the plot> pdf("cool-boxplot-IV.pdf")

> print(v)

> dev.off()

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 32 / 33

Book Recommendations: R graphics & ggplot

R Graphics CookbookWinston Chang, O’Reilly Media, 2012

and its online companion:http://www.cookbook-r.com/Graphs/

ggplot2: Elegant Graphics for Data Analysis (Use R!)Hadley Wickham, Springer, 2009

and its online companion:http://ggplot2.org/book/

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 33 / 33

Recommended