33
An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 1 / 33

An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

An introduction to ggplot2

Claude Renaux

Seminar for Statistics, ETH Zurich

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 1 / 33

Page 2: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Table of Contents

1 Overview

2 Selection of geoms

3 Multiple Layers and Faceting

4 Aesthetics

5 Tailoring Graphics

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 2 / 33

Page 3: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

ggplot2: plots of one variable

In this section we will . . .

. . . get started with ggplot2

. . . look at plots of one variable

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 3 / 33

Page 4: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Must have

Useful cheatsheet: https://www.rstudio.com/resources/cheatsheets/ (pickData Visualisation with ggplot2)

Source: link above. This image is under Creative Commons licence.

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 4 / 33

Page 5: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Why ggplot2?

To mention some advantages:

nice labels

nice colors

small margins

beautiful faceting or multipanel plots

very powerful and flexible: we’ll have a glimpse at the grammar ofgraphics

we can change or update plots

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 5 / 33

Page 6: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Why ggplot2?

To mention some disadvantages:

ggplot2 can only deal with data.frames

default plots of model outputs are normally not possible

ggplot2 is not optimized for the best speed performance

3-D plots are not possible

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 6 / 33

Page 7: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Overview: plots of one variable

One continuous variable

histogram: geom_histogram()

densities: geom_density()

frequency plot: geom_freqpoly()

One discrete variable

barplot: geom_bar()

pie plot: different coordinate system of barplot...

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 7 / 33

Page 8: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Package: ggplot2

There are two important functions.

qplot: similar to base plotting functions

ggplot: look at this function closer

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 8 / 33

Page 9: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Let’s have a look — mpg data set

Let’s look at the mpg data set from ggplot2. It contains 234 observationsabout the fuel efficiency of 38 popular cars in 1999 and 2008.> ?mpg

> str(mpg)

Let’s look at the democode.

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 9 / 33

Page 10: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

ggplot2: plots of two variables

In this section we will . . .

. . . glimpse at grammar of graphics underlying ggplot2

. . . create plots with two variables

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 10 / 33

Page 11: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Glimpse at grammar of graphics

The ”gg” in ggplot2 stands for grammar of graphics which is based onWilkinson’s (2005) grammar of graphics.

The layered grammar is useful because . . .

it is a generic way of creating a plot

do not relay on specific or customized graphic for a particular problem

iteratively update or create a plot

add layers

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 11 / 33

Page 12: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Glimpse at grammar of graphics

Idea: all the plots can be build from the same components

data set

geoms (geometric object) — visually representing observations

aes (aesthetics) — aesthetics properties of the geometric objects

scale — defining how the data is plotted

coordinate system

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 12 / 33

Page 13: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Glimpse at grammar of graphics

Source: https://www.rstudio.com/resources/cheatsheets/ of the Data Visualization

Cheat Sheet. This image is under Creative Commons licence.

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 13 / 33

Page 14: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Glimpse at grammar of graphics

Source: https://www.rstudio.com/resources/cheatsheets/ of the Data Visualization

Cheat Sheet. This image is under Creative Commons licence.

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 14 / 33

Page 15: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Overview: plots of two variables

Two continuous variables

scatter plot: geom_point()

scatter plot using jitter: geom_jitter()

smoother: geom_smooth()

Discrete x and continuous y

boxplot: geom_boxplot()

bar plot: geom_bar(stat = "identity")

Continuous function like time series

line plot: geom_line()

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 15 / 33

Page 16: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Let’s have a look — mpg data set

Let’s look at the democode.

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 16 / 33

Page 17: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

ggplot2: multiple layers and faceting

In this section we will . . .

. . . look at multiple layers

. . . consider faceting or multipanel conditioning plots

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 17 / 33

Page 18: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Multiple layers

Add multiple layers> # variable cty is the city miles per gallon

> ggplot(data = mpg, aes(x = hwy, y = cty)) +

+ geom_jitter() +

+ geom_smooth() +

+ geom_rug(sides = "bl", position = "jitter")

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●●

●●

● ●

●●

●●

●●

●●●

●●

●●●

●●

● ●●

●●

● ●●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

● ●●

● ●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●● ●●

● ●

●● ●

●● ●

●●

● ●●

●●

●●

10

20

30

20 30 40hwy

cty

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 18 / 33

Page 19: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Faceting

Let’s look a multi-panel plots> # subset of the mpg data set

> mpg.small <- subset(mpg, manufacturer %in%

+ c("ford", "land rover", "toyota",

+ "chevrolet", "honda", "volkswagen"))

> ggplot(data = mpg.small, aes(x = hwy, y = cty)) +

+ geom_jitter() +

+ facet_wrap( ~ manufacturer)

●●●

●●

●●●

● ●

● ●●

●●●

●●

● ●●●●●●● ●

●●●

●● ● ●●

●●

●●

●●●

●●●

●●

●●●

●● ●●

●●●

● ● ●●

●● ●

● ●

●●

● ●●●

●●

●●

●●●●

●●

● ● ●●

●●

●●

chevrolet ford honda

land rover toyota volkswagen

10

15

20

25

30

35

10

15

20

25

30

35

20 30 40 20 30 40 20 30 40hwy

cty

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 19 / 33

Page 20: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Let’s have a look at the aesthetics

In this section we will have a look at the aesthetics . . .

. . . size

. . . shape

. . . color

. . . and combine them

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 20 / 33

Page 21: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

size

> ggplot(data = mpg, aes(x = hwy, y = cty, size = displ)) +

+ geom_jitter()

> # displ. engine displacement, in litres

● ●●

●●

●●

●●

●●

●●

●●●

● ●

● ●

●●●

●●

●●

●●

●●●●

● ●●

●●

●●

●●

●●●

● ●

● ●● ●

●●

●●●●●

●●●

●●

●●

● ● ●●●

●●

● ●

●●

●●●

●●

●●

●●

●●

● ●●●● ●●

●●●●

● ●

●●●

●●

● ●

●●

●●

●●●

● ●

●●●

●● ● ●

●●●

● ●●

●●●

●●

●●

● ●●●

●●

●●●

●●

● ●

10

15

20

25

30

35

20 30 40hwy

cty

displ●

●●●

2

3

4

5

6

7

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 21 / 33

Page 22: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

shape

> ggplot(data = mpg, aes(x = hwy, y = cty, shape = factor(cyl))) +

+ geom_jitter(size = 3)

> # one can set a fixed size for all the points

●●●

●●

● ●

●●

●●

● ●

●●

●●

●● ●

● ●●

●●

●● ●●● ● ●●

●●

●●

10

15

20

25

30

35

20 30 40hwy

cty

factor(cyl)

● 4

5

6

8

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 22 / 33

Page 23: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

color

> ggplot(data = mpg, aes(x = hwy, y = cty, color = factor(cyl))) +

+ geom_jitter(size = 3)

●●●

●●

●●

●●

●●

●●●

● ●

● ●●

●●

●●

●●

●●●●

●●

● ●●

●●

●●

●●

● ●

●●●

●●

●●●●●

●● ●

●●

●●

●● ●●●

●●

● ●

● ●

●●●● ●

●●

●●

●●

● ●

● ●● ●●

●●●●

● ●

●●●

●●

●●

●●

●●●

●●

●●●

●●●●

●●

● ●●

●● ●

● ●

●●●

●●●

● ●●

●●

10

20

30

10 20 30 40hwy

cty

factor(cyl)

4

5

6

8

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 23 / 33

Page 24: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Combine them

> ggplot(data = mpg, aes(x = hwy, y = cty, color = factor(cyl),

+ shape = factor(cyl), size = displ)) +

+ geom_jitter()

> # nice, there is only one combined legend for shape and color

●●

●●

● ●

●●

●●

●●

● ●

●●

●●●

● ●

●●

●●

●●● ● ●

● ●

●●

●●

10

15

20

25

30

35

20 30 40hwy

cty

displ●

●●●

2

3

4

5

6

7

factor(cyl)● 4

5

6

8

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 24 / 33

Page 25: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

The logic . . .

In this section we will look at. . .

. . . where to put what?

. . . the order of the layers

. . . how to save a plot

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 25 / 33

Page 26: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Where to put what?

All the arguments specified in the function ggplot() are passed to allthe layers.

Holds unless specified otherwise in a layer.

Arguments specified in a layer effect only the corresponding layer.

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 26 / 33

Page 27: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Where to put what?

Basic plot to start with> ggplot(data = mpg, aes(x = hwy, y = cty)) + geom_jitter() + geom_smooth()

Color both points and smoothers per group of drv⇒ three smoothers are fitted> ggplot(data = mpg, aes(x = hwy, y = cty, color = drv)) + geom_jitter() +

+ geom_smooth()

●●●

● ●

●●

●●

●●●

● ●

●●

●●

●●●●

●●

●●

●●

●● ●●

●●

●●

●●

● ●

● ●●

●●●●

●●

●●●

●●

● ● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●●

●●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●● ● ●

●●●

●● ●

●●

●●

● ●●●

●●

●●

●●

●●

10

20

30

20 30 40hwy

cty

drv●

4

f

r

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 27 / 33

Page 28: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Where to put what?

Color the points per group of drv⇒ one smoother is fitted> ggplot(data = mpg, aes(x = hwy, y = cty)) + geom_jitter(aes(color = drv)) +

+ geom_smooth()

●●

● ●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●●

●●●●

●● ●

●●

●● ●●

● ●

● ●

●●●● ●

●●

●●

●●

●●

● ●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

● ●●● ●

●●●

●● ● ●

●●

● ● ●

●● ●

● ●

●●

● ●●

●●●

●●

●●

10

20

30

20 30 40hwy

cty

drv●

4

f

r

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 28 / 33

Page 29: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Where to put what?

Color the smoothers per groug drv

⇒ three smoothers are fitted> ggplot(data = mpg, aes(x = hwy, y = cty)) + geom_jitter() +

+ geom_smooth(aes(color = drv))

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●●

●●

● ● ●

●●

●●

● ● ●●

●●

●●

●●

● ●

●●

●●●

●●

●●●

● ●●

●●

● ●●●

● ●●●

●●●

●●

●●

●●●

●●

●●

● ●●

● ●

●●

●● ●●

●●

● ● ●

● ●●

●●

● ●●

●●

●●

●●

● ●

10

20

30

20 30 40hwy

cty

drv

4

f

r

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 29 / 33

Page 30: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Does the order of the layers matter?

Plot the points first as a layer and then add the layer with boxplots:> ggplot(data = mpg, aes(x = class, y = cty)) +

+ geom_jitter(alpha = 0.4) +

+ geom_boxplot(aes(fill = class))

Plot the boxplot first and afterwards add the layer of points:> ggplot(data = mpg, aes(x = class, y = cty)) +

+ geom_boxplot(aes(fill = class)) +

+ geom_jitter(alpha = 0.4)

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 30 / 33

Page 31: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

How to save a plot?

The plot to be saved> v <- ggplot(data = mpg, aes(x = class, y = cty)) + geom_boxplot() +

+ geom_jitter(alpha = 0.3)

Plots which are saved in a object can be saved by ggsave().> ggsave(filename = "cool-boxplot-II.png", plot = v)

> # Cool, `ggsave()` recognizes the output format!

> # (pdf, png, jpg, eps, svg)

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 31 / 33

Page 32: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

How to save a plot?

Control the width & height and change the path> # the save of the figure can be controlled by

> ggsave(filename = "cool-boxplot-III.jpg", plot = v, width = 5, height = 4,

+ path = "/path/of/figures/")

Alternatively don’t forget print the plot> pdf("cool-boxplot-IV.pdf")

> print(v)

> dev.off()

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 32 / 33

Page 33: An introduction to ggplot2 · 2016. 7. 11. · An introduction to ggplot2 Claude Renaux Seminar for Statistics, ETH Zurich Seminar for Statistics, ETH Zurich (SfS) An introduction

Book Recommendations: R graphics & ggplot

R Graphics CookbookWinston Chang, O’Reilly Media, 2012

and its online companion:http://www.cookbook-r.com/Graphs/

ggplot2: Elegant Graphics for Data Analysis (Use R!)Hadley Wickham, Springer, 2009

and its online companion:http://ggplot2.org/book/

Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 33 / 33