68
Advanced Plotting with ggplot2 Algorithm Design & Software Engineering November 13, 2016 Stefan Feuerriegel

Advanced Plotting with ggplot2 - uni-freiburg.de

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Advanced Plotting with ggplot2 - uni-freiburg.de

Advanced Plotting with ggplot2Algorithm Design & Software Engineering

November 13, 2016Stefan Feuerriegel

Page 2: Advanced Plotting with ggplot2 - uni-freiburg.de

Today’s Lecture

Objectives

1 Distinguishing different types of plots and their purpose

2 Learning the grammar of graphics

3 Create high-quality plots with ggplot2

2Plotting

Page 3: Advanced Plotting with ggplot2 - uni-freiburg.de

Outline

1 Introduction

2 Plot Types (Geometries)

3 Plot Appearance

4 Advanced Usage

5 Wrap-Up

3Plotting

Page 4: Advanced Plotting with ggplot2 - uni-freiburg.de

Outline

1 Introduction

2 Plot Types (Geometries)

3 Plot Appearance

4 Advanced Usage

5 Wrap-Up

4Plotting: Introduction

Page 5: Advanced Plotting with ggplot2 - uni-freiburg.de

Motivation

Why plotting?I Visualizations makes it easier to understand and explore data

I Common types of plots: bar chart, histogram, line plot, scatter plot, boxplot, pirate plot, . . .

Plotting with ggplot2 in RI Built-in routines cover most types, yet the have no consistent interface

and limited flexibilityI Package ggplot2 is a powerful alternative

I Abstract language that is flexible, simple and user-friendlyI Nice aesthetics by defaultI Themes for common look-and-feel

I “gg” stands for “grammar of graphics”

I Limited to 2D plots (3D plots not supported)

I Commonly used by New York Times, Economics, . . .

5Plotting: Introduction

Page 6: Advanced Plotting with ggplot2 - uni-freiburg.de

Example with ggplot2

I Load package

library(ggplot2)

I Create sample data

line_data <- data.frame(year=1900:2000, price=1:101)

I Visualize data frame as line plot

ggplot(line_data, aes(x=year, y=price)) +geom_line()

0

25

50

75

100

1900 1925 1950 1975 2000year

pric

e

6Plotting: Introduction

Page 7: Advanced Plotting with ggplot2 - uni-freiburg.de

Calls to ggplot2General format

ggplot(data, aes(x=variable_x, y=variable_y)) +geom_*() +additional_modifications()

I ggplot() expects a data frame (not: matrix) as a first input, followedby the aesthetics that map variables by name onto axes

I Building blocks are concatenated via +I ∗ is any of the supported plot typesI The geom_*() can overwrite previous aesthetics

I ggplot(data) +geom_line(aes(x=variable_x, y=variable_y1)) +geom_line(aes(x=variable_x, y=variable_y2))

I ggplot(data, aes(x=variable_x)) +geom_line(aes(y=variable_y1)) +geom_line(aes(y=variable_y2))

7Plotting: Introduction

Page 8: Advanced Plotting with ggplot2 - uni-freiburg.de

Terminology

I Data: underlying information to be visualized

I Aesthetics: controls the color/shape/. . . of observations and whichvariables go on the x- and y-axis

I Geometry: geometric objects in the plot; e. g. points, lines, bars,polygons, . . .

I Layers: individual plots, i. e. calls to geom_*()

I Facets: creates panels of sub-plots

I Scales: sets look-and-feel of axes

I Themes: overall color palette and layout of plot

I Statistics: transformations of the data before displayI Legends: appearance and position of legend

I Each layer consists of data and aesthetics, plus additionalcustomizations

I A plot can have a one or an arbitrary number of layers

8Plotting: Introduction

Page 9: Advanced Plotting with ggplot2 - uni-freiburg.de

Aesthetics

I Aesthetics aes(...) set “what you see”I Variables which go on x- and y-axisI Color of outer borderI Fill color of inside areaI Shape of pointsI Line typeI Size of points and linesI Grouping of values

I Expect a column name representing the variable

I Short form by aes(x, y) where identifiers x= and y= are omitted

9Plotting: Introduction

Page 10: Advanced Plotting with ggplot2 - uni-freiburg.de

Wide vs. Long Data

Data formatI Wide data: multiple measurements for the same subject, each in a

different column

I Long data: subjects have multiple rows, each with one measurement

Example

Wide formatCompany Sales Drinks Sales Food

A 300 400B 200 100C 50 0

Long formatCompany Category Sales

A Drinks 300A Food 400B Drinks 200B Food 100C Drinks 50C Food 0

Note: ggplot2 requires data in long format

10Plotting: Introduction

Page 11: Advanced Plotting with ggplot2 - uni-freiburg.de

Conversion Between Long and Wide Data

I Prepare sample data

d_wide <- data.frame(Company = c("A", "B", "C"),SalesDrinks = c(300, 200, 50),SalesFood = c(400, 100, 0))

I Load necessary package reshape2

library(reshape2)

I Call function melt(data_wide, id.vars=v) to convert widedata into a long format where v identifies the subject

melt(d_wide, id.vars="Company")

## Company variable value## 1 A SalesDrinks 300## 2 B SalesDrinks 200## 3 C SalesDrinks 50## 4 A SalesFood 400## 5 B SalesFood 100## 6 C SalesFood 0

11Plotting: Introduction

Page 12: Advanced Plotting with ggplot2 - uni-freiburg.de

Outline

1 Introduction

2 Plot Types (Geometries)

3 Plot Appearance

4 Advanced Usage

5 Wrap-Up

12Plotting: Geometries

Page 13: Advanced Plotting with ggplot2 - uni-freiburg.de

Plot Types

I ggplot2 ships the following geometric objects (geoms) amongst others

I Function names start with geom_*()

Two variablesI Scatter plot (also named point plot) geom_point()

I Line plot geom_line()

I Area chart geom_area()

One variable (discrete)I Bar chart geom_bar()

One variable (continuous)I Histogram geom_histogram()

I Boxplot geom_boxplot()

I Density plot geom_density()13Plotting: Geometries

Page 14: Advanced Plotting with ggplot2 - uni-freiburg.de

Scatter Plot

I A scatter plot displays each observation as a geometric pointI Optional arguments: alpha (transparency), size, color, shape

points <- data.frame(x=rnorm(20), y=rnorm(20))p1 <- ggplot(points, aes(x, y)) +geom_point()

p2 <- ggplot(points, aes(x, y)) +geom_point(alpha=0.4, color="darkblue")

p1

●●

●●

−1.0

−0.5

0.0

0.5

1.0

−1 0 1 2x

y

p2

−1.0

−0.5

0.0

0.5

1.0

−1 0 1 2x

y

14Plotting: Geometries

Page 15: Advanced Plotting with ggplot2 - uni-freiburg.de

Point Shapes

I Argument shape accepts different values

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

22

21

24

23

20

I Shapes 21–24 distinguish two colors:I A border color (argument: color)I A fill color (argument: fill)

15Plotting: Geometries

Page 16: Advanced Plotting with ggplot2 - uni-freiburg.de

Scatter Plot

I Aesthetics can also change size, shape or color based on variables

ggplot(mpg, aes(x=displ, y=hwy)) +geom_point(aes(size=cyl, fill=drv), shape=21)

●●

●●

●●●●●

●●

●●●●●●●

●●●

●●

●●

●● ●●●● ●

●●●

● ●●●●●

●●●●●●●● ●●

●●●

● ●●● ● ●

●●●●●●●

●● ●●● ●●●●●● ●●●● ●●● ●●

●●●●

●●●● ●

●●●

●●

●●

●●

●●●

●●●

●●●●

●●●●

● ●●●

●●● ●●●●

● ●●

●●

●●●● ●

●●●

● ●●●

●●●

●●

●● ●●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●

●●

●● ●

●●

●● ●●

●●

●●●●

●●

●● ●

20

30

40

2 3 4 5 6 7displ

hwy

4

f

r

cyl●

●●

4

5

6

7

8

16Plotting: Geometries

Page 17: Advanced Plotting with ggplot2 - uni-freiburg.de

Line Plot

I Line plot displays points as a connected line

x <- seq(0, 2*pi, by=0.01)data_sin_cos <- data.frame(x=x, sin=sin(x), cos=cos(x))

ggplot(data_sin_cos, aes(x)) +geom_line(aes(y=sin)) +geom_line(aes(y=cos), color="darkred")

−1.0

−0.5

0.0

0.5

1.0

0 2 4 6x

sin

I Optional arguments: color, linetype, size, group

17Plotting: Geometries

Page 18: Advanced Plotting with ggplot2 - uni-freiburg.de

Line Types

I Argument linetype picks a line type based the following identifiers

solid

dashed

dotted

dotdash

longdash

twodash

18Plotting: Geometries

Page 19: Advanced Plotting with ggplot2 - uni-freiburg.de

Line Plot

I Long data allows for efficient grouping and simpler plotsI Argument group denotes the variable with the group membershipI Alternative is to use color for different colors

data_lines2 <- data.frame(x=c(1:10, 1:10),var=c(rep("y1", 10), rep("y2", 10)),y=c(rep(5, 10), 11:20))

ggplot(data_lines2) +geom_line(aes(x=x, y=y, group=var))

5

10

15

20

2.5 5.0 7.5 10.0x

y

19Plotting: Geometries

Page 20: Advanced Plotting with ggplot2 - uni-freiburg.de

Line Plot

I Grouping can occur through all aestheticsI Common is to use color for different colors

data_lines2 <- data.frame(x=c(1:10, 1:10),var=c(rep("y1", 10), rep("y2", 10)),y=c(rep(5, 10), 11:20))

ggplot(data_lines2) +geom_line(aes(x=x, y=y, color=var))

5

10

15

20

2.5 5.0 7.5 10.0x

y

var

y1

y2

20Plotting: Geometries

Page 21: Advanced Plotting with ggplot2 - uni-freiburg.de

Area Chart

I Similar to a line plot, but the area is filled in color

I Individual areas are mapped via group and colored via fillI position="stack" stacks the areas on top of each other

ggplot(data_lines2) +geom_area(aes(x=x, y=y, fill=var, group=var),

position="stack")

0

5

10

15

20

25

2.5 5.0 7.5 10.0x

y

var

y1

y2

21Plotting: Geometries

Page 22: Advanced Plotting with ggplot2 - uni-freiburg.de

Area Chart

I Argument position="fill" shows relative values for each groupout of 100 %

ggplot(data_lines2) +geom_area(aes(x=x, y=y, fill=var, group=var),

position="fill")

0.00

0.25

0.50

0.75

1.00

2.5 5.0 7.5 10.0x

y

var

y1

y2

22Plotting: Geometries

Page 23: Advanced Plotting with ggplot2 - uni-freiburg.de

Bar Chart

I Bar chart compares values, counts and statistics among categories

I The x-axis usually displays the discrete categoriesI The y-axis depicts the given value (stat="identity") or also

transformed statistics

grades_freq <- data.frame(grade=c("good", "fair", "bad"),freq=c(3, 2, 5))

ggplot(grades_freq) +geom_bar(aes(grade, freq), stat="identity")

012345

bad fair goodgrade

freq

I Categories are sorted alphabetically by default23Plotting: Geometries

Page 24: Advanced Plotting with ggplot2 - uni-freiburg.de

Bar Chart

I stat="count" automatically counts the frequency of observations

grades <- data.frame(grade=c("good", "good", "good","fair", "fair","bad", "bad", "bad","bad", "bad"))

ggplot(grades) +geom_bar(aes(x=grade), stat="count")

0

1

2

3

4

5

bad fair goodgrade

coun

t

24Plotting: Geometries

Page 25: Advanced Plotting with ggplot2 - uni-freiburg.de

Stacked Bar Chart

I Group membership controlled by fill color

ggplot(diamonds) +geom_bar(aes(x=color, fill=cut), stat="count")

0

3000

6000

9000

D E F G H I Jcolor

coun

t

cut

Fair

Good

Very Good

Premium

Ideal

25Plotting: Geometries

Page 26: Advanced Plotting with ggplot2 - uni-freiburg.de

Grouped Bar Chart

I Bars are displayed next to each other via position="dodge"

ggplot(diamonds) +geom_bar(aes(x=color, fill=cut), stat="count",

position="dodge")

0

1000

2000

3000

4000

5000

D E F G H I Jcolor

coun

t

cut

Fair

Good

Very Good

Premium

Ideal

26Plotting: Geometries

Page 27: Advanced Plotting with ggplot2 - uni-freiburg.de

Histogram

I Histogram shows frequency of continuous data by dividing the range ofvalues into bins

I Each bar then denotes the frequency of data falling into that binI Illustrates the distribution of the data

ggplot(points) +geom_histogram(aes(x))

## ‘stat_bin()‘ using ‘bins = 30‘. Pick better value with‘binwidth‘.

0

1

2

3

4

−1 0 1 2x

coun

t

27Plotting: Geometries

Page 28: Advanced Plotting with ggplot2 - uni-freiburg.de

Histogram

I Optional arguments: border color (color), fill color (fill), width ofthe bins (binwidth)

I ggplot automatically defines new variables (..count.. and..density..) that can be used in the aesthetics

I y=..density.. displays density on y-axis instead of frequency

ggplot(points) +geom_histogram(aes(x, y=..density..), binwidth=0.1,

fill="steelblue", colour="black")

0.0

0.5

1.0

1.5

2.0

−1 0 1 2x

dens

ity

28Plotting: Geometries

Page 29: Advanced Plotting with ggplot2 - uni-freiburg.de

Box Plot

I Box plots visualize distribution by highlighting median and quartiles

height <- data.frame(gender=c(rep("m", 100), rep("f", 100)),height=c(rnorm(100, mean=175),

rnorm(100, mean=172)))ggplot(height) +

geom_boxplot(aes(gender, height))

172

174

176

f mgender

heig

ht

29Plotting: Geometries

Page 30: Advanced Plotting with ggplot2 - uni-freiburg.de

Density Plot

I Estimates the density as a mean to approximate the distribution

I Smooth alternative of a histogramI Optional argument: alpha allows colors to be transparent

ggplot(height) +geom_density(aes(x=height, fill=gender),

stat="density", alpha=0.6)

0.0

0.1

0.2

0.3

0.4

172 174 176height

dens

ity

gender

f

m

30Plotting: Geometries

Page 31: Advanced Plotting with ggplot2 - uni-freiburg.de

Outline

1 Introduction

2 Plot Types (Geometries)

3 Plot Appearance

4 Advanced Usage

5 Wrap-Up

31Plotting: Appearance

Page 32: Advanced Plotting with ggplot2 - uni-freiburg.de

Outline

3 Plot AppearanceLayersFacetsScalesThemesLegends

32Plotting: Appearance

Page 33: Advanced Plotting with ggplot2 - uni-freiburg.de

Multiple Layers

I Concatenation allows for combining several layersI Each layer has it own aesthetics

df <- data.frame(px=rnorm(20), py=rnorm(20),lx=seq(-1, +1, length.out=20))

df$ly <- df$lx^2

ggplot(df) +geom_point(aes(px, py)) +geom_line(aes(lx, ly))

●●

●●

−1.0

−0.5

0.0

0.5

1.0

−1 0 1 2px

py

33Plotting: Appearance

Page 34: Advanced Plotting with ggplot2 - uni-freiburg.de

Smoothing Layers

I Smoothing layer geom_smooth implements trend curvesI Linear trend (method="lm")I Local polynomial regression (method="loess") with smoothing

parameter spanI Generalized additive model (method="gam")

I Variable choice is also controlled by aesthetics aes(x, y)

I Gray shade highlights the 95 % confidence interval

df <- data.frame(x=seq(0.5, 3,length.out=100))

df$y <- sin(df$x) + rnorm(100)

ggplot(df, aes(x, y)) +geom_point() +geom_smooth(method="lm")

●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●●●●

●●

●●●

−1

0

1

2

3

1 2 3x

y

34Plotting: Appearance

Page 35: Advanced Plotting with ggplot2 - uni-freiburg.de

Smoothing Layersmethod="lm"

●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●●●●

●●

●●●

−1

0

1

2

3

1 2 3

method="gam"

●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●●●●

●●

●●●

−1

0

1

2

3

1 2 3

method="loess", span=0.25

●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●●●●

●●

●●●

−1

0

1

2

3

1 2 3

method="loess", span=0.75

●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●●●●

●●

●●●

−1

0

1

2

3

1 2 3

35Plotting: Appearance

Page 36: Advanced Plotting with ggplot2 - uni-freiburg.de

Outline

3 Plot AppearanceLayersFacetsScalesThemesLegends

36Plotting: Appearance

Page 37: Advanced Plotting with ggplot2 - uni-freiburg.de

Facets

I Facets display a grid of plots stemming from the same dataI Command: facet_grid(y ~ x) specifies grouping variablesI By default, the same axis resolution is used on adjacent plots

Example with 1 group on x-axis

ggplot(mpg, aes(displ, hwy)) +geom_point(alpha = 0.3) +facet_grid(. ~ year)

1999 2008

20

30

40

2 3 4 5 6 7 2 3 4 5 6 7displ

hwy

37Plotting: Appearance

Page 38: Advanced Plotting with ggplot2 - uni-freiburg.de

FacetsExample with 2 groups on x- and y-axis

ggplot(mpg, aes(displ, hwy)) +geom_point(alpha = 0.3) +facet_grid(cyl ~ year)

1999 2008

203040

203040

203040

203040

45

68

2 3 4 5 6 7 2 3 4 5 6 7displ

hwy

38Plotting: Appearance

Page 39: Advanced Plotting with ggplot2 - uni-freiburg.de

Outline

3 Plot AppearanceLayersFacetsScalesThemesLegends

39Plotting: Appearance

Page 40: Advanced Plotting with ggplot2 - uni-freiburg.de

Scales

Scales control the look of axes, especially for continuous and discrete data

I scale_<axis>_log10() useslog-scale on axisexp_growth <- data.frame(x=1:10,

y=2^(1:10))ggplot(exp_growth, aes(x, y)) +geom_point() +scale_y_log10()

10

100

1000

2.5 5.0 7.5 10.0x

y

40Plotting: Appearance

Page 41: Advanced Plotting with ggplot2 - uni-freiburg.de

Scales

I coord_equal() enforcesan equidistant scaling onboth axes

ggplot(df, aes(x, y)) +geom_point() +coord_equal() ●

●●

●●●

●●

●●

●●

●●

●●

−1

0

1

2

3

1 2 3x

y

41Plotting: Appearance

Page 42: Advanced Plotting with ggplot2 - uni-freiburg.de

Geometry Layout

I Changes to geometry layout links to the use of aestheticsI Additional function call to scale_<aestetics>_<type>(...)

1 Aesthetic to change, e. g. color, fill, linetype, . . .2 Variable type controls appearance, e. g. gradient (continuous scale),hue (discrete values), manual (manual breaks), . . .

ggplot(mtcars, aes(x=mpg, y=disp)) +geom_point(aes(size=hp, color=as.factor(cyl)))

●●

●●●●

● ●●

●●●

●● ●

●●

●●

●100

200

300

400

10 15 20 25 30 35mpg

disp

as.factor(cyl)●

4

6

8

hp

●●

100

150

200

250

300

42Plotting: Appearance

Page 43: Advanced Plotting with ggplot2 - uni-freiburg.de

scale_color_gradient

I Color gradient stems from a range between two colors→ Arguments: low, high

I Useful for visualizing continuous values

points_continuous <- cbind(points, z=runif(20))p <- ggplot(points_continuous) +

geom_point(aes(x=x, y=y, color=z))

p + scale_color_gradient()

●●

●●

●●

−1.0−0.5

0.00.51.0

−1 0 1 2x

y

0.25

0.50

0.75

p + scale_color_gradient(low="darkblue",high="darkred")

●●

●●

●●

−1.0−0.5

0.00.51.0

−1 0 1 2x

y

0.25

0.50

0.75

43Plotting: Appearance

Page 44: Advanced Plotting with ggplot2 - uni-freiburg.de

scale_color_hue

I Uses disjunct buckets of colors for visualizing discrete values

I Requires source variable to be a factor

points_discrete <- cbind(points,z=as.factor(sample(5, 20, replace=TRUE)))

p <- ggplot(points_discrete) +geom_point(aes(x=x, y=y, color=z))

p + scale_color_hue()

●●

●●

●●

−1.0−0.5

0.00.51.0

−1 0 1 2x

y

1

2

3

4

5

p + scale_color_hue(h=c(180, 270))

●●

●●

●●

−1.0−0.5

0.00.51.0

−1 0 1 2x

y

1

2

3

4

5

44Plotting: Appearance

Page 45: Advanced Plotting with ggplot2 - uni-freiburg.de

scale_color_manual

I Specifies colors for different groups manuallyI Argument values specifies a vector of new color names

ggplot(data_lines2) +geom_line(aes(x=x, y=y, color=var)) +scale_color_manual(values=c("darkred", "darkblue"))

5

10

15

20

2.5 5.0 7.5 10.0x

y

var

y1

y2

45Plotting: Appearance

Page 46: Advanced Plotting with ggplot2 - uni-freiburg.de

Color Palettes

I Built-in color palettes change color schemeI Distinguished by discrete and continuous source variables

1 Discrete values and colors via scale_color_brewer()2 Continuous values and colors via scale_color_distiller()

I Further customizationsI Overview of color palettes:http://www.cookbook-r.com/Graphs/Colors_(ggplot2)

I Package ggtheme has several built-in schemes:https://cran.r-project.org/web/packages/ggthemes/vignettes/ggthemes.html

I Color picker:http://www.colorbrewer2.org/

46Plotting: Appearance

Page 47: Advanced Plotting with ggplot2 - uni-freiburg.de

Discrete Color Palettes

I scale_color_brewer accesses built-in color palettes for discretevalues

pd <- ggplot(points_discrete) +labs(x="", y="") +geom_point(aes(x, y, color=z))

Default

pd + scale_color_brewer()

●●

●●

−1.0

−0.5

0.0

0.5

1.0

−1 0 1 2

z●

1

2

3

4

5

Intense colors

pd + scale_color_brewer(palette="Set1")

●●

●●

−1.0

−0.5

0.0

0.5

1.0

−1 0 1 2

z●

1

2

3

4

5

47Plotting: Appearance

Page 48: Advanced Plotting with ggplot2 - uni-freiburg.de

Continuous Color Palettes

I scale_color_distiller accesses built-in color palettes forcontinuous values

pc <- ggplot(points_continuous) +labs(x="", y="") +geom_point(aes(x, y, color=z))

Default

pc + scale_color_distiller()

●●

●●

−1.0

−0.5

0.0

0.5

1.0

−1 0 1 2

0.25

0.50

0.75

z

Spectral colorspc + scale_color_distiller(palette="Spectral")

●●

●●

−1.0

−0.5

0.0

0.5

1.0

−1 0 1 2

0.25

0.50

0.75

z

48Plotting: Appearance

Page 49: Advanced Plotting with ggplot2 - uni-freiburg.de

Gray-Scale Coloring

I No unique identifier for gray-scale coloring1 scale_color_gray() colors discrete values in gray-scale→ Attention: “grey” as used in British English

2 scale_color_gradient() refers to a continuous spectrum

Discrete values

pd + scale_color_grey()

●●

●●

−1.0

−0.5

0.0

0.5

1.0

−1 0 1 2

z●

1

2

3

4

5

Continuous values

pc + scale_color_gradient(low="white",high="black")

●●

●●

−1.0

−0.5

0.0

0.5

1.0

−1 0 1 2

0.25

0.50

0.75

z

49Plotting: Appearance

Page 50: Advanced Plotting with ggplot2 - uni-freiburg.de

Ranges

I Crop plot to ranges via xlim(range)or ylim(range)

ggplot(df, aes(x, y)) +geom_point() +xlim(c(1, 2)) +ylim(c(-1, +1))

●●

●●

−1.0

−0.5

0.0

0.5

1.0

1.00 1.25 1.50 1.75 2.00x

y

50Plotting: Appearance

Page 51: Advanced Plotting with ggplot2 - uni-freiburg.de

Outline

3 Plot AppearanceLayersFacetsScalesThemesLegends

51Plotting: Appearance

Page 52: Advanced Plotting with ggplot2 - uni-freiburg.de

Themes

I Themes further customize the appearance of plots

I Printer-friendly theme theme_bw() for replacing the graybackground

ggplot(df, aes(x, y)) +geom_point()

●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●●●●

●●

●●●

−1

0

1

2

3

1 2 3x

y

ggplot(df, aes(x, y)) +geom_point() +theme_bw()

●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●●●●

●●

●●●

−1

0

1

2

3

1 2 3x

y

52Plotting: Appearance

Page 53: Advanced Plotting with ggplot2 - uni-freiburg.de

Themes

I Package ggthemes provides further styles

library(ggthemes)

Example with the style from The Economist

ggplot(mpg, aes(displ, hwy)) +geom_point() +theme_economist()

●●●●

●●●

●●

●●

●● ●●●

●●

● ●

●●

●●

●● ●

●●●●

●●● ●

●●

●●●●

●●

● ●

●●

●●

●●●

●●

●●

● ●●

●●●● ●

●●●●●●

●●

●●

●●

●●●●

●●●

●●●●

●●

●●

●●●

●●●

●●●●

●●

●●

●●

● ●

●●●●

●● ●

●●

●●

●●●

● ●

●●

●●

● ●●●

●●●

●●●

●● ●●●●●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●●●

●●●●

●●

●●

●●

●● ●●

●●

●●●●●

●●

●● ●

20

30

40

2 3 4 5 6 7displ

hwy

53Plotting: Appearance

Page 54: Advanced Plotting with ggplot2 - uni-freiburg.de

Labels

I Change labels via labs(...)

ggplot(df, aes(x, y)) +geom_point() +labs(x = "x-axis", y = "y-axis")

●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●●●●

●●

●●●

−1

0

1

2

3

1 2 3x−axis

y−ax

is

Recommendation: don’t use titles in plots→ Instead of titles, better place details in the caption of scientific papers

54Plotting: Appearance

Page 55: Advanced Plotting with ggplot2 - uni-freiburg.de

Outline

3 Plot AppearanceLayersFacetsScalesThemesLegends

55Plotting: Appearance

Page 56: Advanced Plotting with ggplot2 - uni-freiburg.de

Legend

I Legends are placed automatically for each aesthetic in usedI Examples: group, color, . . .

ggplot(data_lines2) +geom_line(aes(x=x, y=y, color=var))

5

10

15

20

2.5 5.0 7.5 10.0x

y

var

y1

y2

I Frequent changes include1 Data is in long format and should be renamed2 Data is in long format and should be customized3 Data is in wide format and each geom_* should be customized

56Plotting: Appearance

Page 57: Advanced Plotting with ggplot2 - uni-freiburg.de

Legend

Case 1: Data is in long format and should be renamedI Add scale_<aesthetics>_discrete(...) to overwrite

matchingI Argument labels specifies new labels

ggplot(data_lines2) +geom_line(aes(x=x, y=y, color=var)) +scale_color_discrete(labels=c("One", "Two"))

5

10

15

20

2.5 5.0 7.5 10.0x

y

var

One

Two

57Plotting: Appearance

Page 58: Advanced Plotting with ggplot2 - uni-freiburg.de

Legend

Case 2: Data is in long format and should be customized

I Add scale_<aesthetics>_manual to change appearanceI Argument values specifies new attributes (e. g. color)

ggplot(data_lines2) +geom_line(aes(x=x, y=y, color=var)) +scale_color_manual(values=c("darkred", "darkblue"))

5

10

15

20

2.5 5.0 7.5 10.0x

y

var

y1

y2

58Plotting: Appearance

Page 59: Advanced Plotting with ggplot2 - uni-freiburg.de

Legend

Case 3: Data is in wide format and each geom_* should be customizedI Add additional aesthetics with string identifierI Change appearance with scale_<aesthetics>_manual()

ggplot(data_sin_cos, aes(x)) +geom_line(aes(y=sin, color="sin")) +geom_line(aes(y=cos, color="cos")) +scale_color_manual(labels=c("Sine", "Cosine"),

values=c("darkred", "darkblue"))

−1.0

−0.5

0.0

0.5

1.0

0 2 4 6x

sin

colour

Sine

Cosine

I Recommendation: better convert to long format with melt(...)! 59Plotting: Appearance

Page 60: Advanced Plotting with ggplot2 - uni-freiburg.de

Legend Position

I Default position of legend is outside of plotI theme(legend.position="none") hides the legendI theme(legend.position=c(x, y)) moves it inside the gridI x ,y ∈ [0,1] are relative positions starting from the bottom-left corner

ggplot(data_lines2) +geom_line(aes(x=x, y=y, color=var)) +theme(legend.position=c(0.2, 0.6))

5

10

15

20

2.5 5.0 7.5 10.0x

y

var

y1

y2

60Plotting: Appearance

Page 61: Advanced Plotting with ggplot2 - uni-freiburg.de

Legend Title

I Legend title is set inside scale_<aesthetics>_<type>(...)

I Passed as the first argument or argument name

I Displays maths via expression(...)

p <- ggplot(data_lines2) +geom_line(aes(x=x, y=y, color=var))

p + scale_color_discrete(name="new")

5

10

15

20

2.5 5.0 7.5 10.0x

y

new

y1

y2

p + scale_color_discrete(expression(alpha[i]))

5

10

15

20

2.5 5.0 7.5 10.0x

y

αi

y1

y2

61Plotting: Appearance

Page 62: Advanced Plotting with ggplot2 - uni-freiburg.de

Outline

1 Introduction

2 Plot Types (Geometries)

3 Plot Appearance

4 Advanced Usage

5 Wrap-Up

62Plotting: Advanced Usage

Page 63: Advanced Plotting with ggplot2 - uni-freiburg.de

qplot

I qplot(x, y) is a wrapper similar to plot(...)

Histogram

qplot(df$x)

0

1

2

3

4

1 2 3df$x

coun

t

Point plot

qplot(df$x, df$y)

●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●●●●

●●

●●●

−1

0

1

2

3

1 2 3df$x

df$y

Line plot

qplot(df$x, df$y,geom="line")

−1

0

1

2

3

1 2 3df$x

df$y

63Plotting: Advanced Usage

Page 64: Advanced Plotting with ggplot2 - uni-freiburg.de

Date and Time

I Values of type date or time are formatted automatically

Datedates <- as.Date(c("2016-01-01", "2016-02-01",

"2016-07-01", "2016-12-01"))sales <- data.frame(date=dates,

value=c(10, 20, 40, 30))ggplot(sales, aes(date, value)) +

geom_line()

10

20

30

40

Jan 2016 Apr 2016 Jul 2016 Okt 2016date

valu

e

Timetimes <- as.POSIXct(c("2001-01-01 10:00",

"2001-01-01 12:00","2001-01-01 15:00"))

temp <- data.frame(time=times,value=c(15, 20, 25))

ggplot(temp, aes(time, value)) +geom_line()

15.0

17.5

20.0

22.5

25.0

10:00 11:00 12:00 13:00 14:00 15:00time

valu

e

64Plotting: Advanced Usage

Page 65: Advanced Plotting with ggplot2 - uni-freiburg.de

MapsI Package ggmap allows to plot geometries on a map

library(ggmap)

I Download map with get_map(...)map <- get_map("Germany", zoom=5, color="bw")

I Coordinates are given as longitude/latitudegeo <- data.frame(lat=c(52.52, 50.12, 48.15),

lon=c(13.41, 8.57, 11.54))ggmap(map) +

geom_point(data=geo, aes(lon, lat), color="red")

45

50

55

0 5 10 15 20lon

lat

65Plotting: Advanced Usage

Page 66: Advanced Plotting with ggplot2 - uni-freiburg.de

Exporting Plots

I Workflow is greatly accelerated when exporting plots automatically

I PDF output is preferred in LATEX, PNG for WordI ggsave(filename) exports the last plot to the disk

1 Export as PNG

ggsave("plot.png")

2 Export as PDF

ggsave("plot.pdf")

I File extension specifies format implicitlyI Alternative arguments specify filename and size (i. e. resolution)

p <- ggplot(df, aes(x, y))ggsave(p, file="/path/plot.pdf",

width=6, height=4)

66Plotting: Advanced Usage

Page 67: Advanced Plotting with ggplot2 - uni-freiburg.de

Outline

1 Introduction

2 Plot Types (Geometries)

3 Plot Appearance

4 Advanced Usage

5 Wrap-Up

67Plotting: Wrap-Up

Page 68: Advanced Plotting with ggplot2 - uni-freiburg.de

Further ReadingOnline resources

I Official ggplot2 documentationhttp://docs.ggplot2.org/current/→ Collection of reference materials and examples how parameters affect the layout

I Cookbook for R Graphshttp://www.cookbook-r.com/Graphs/→ Collection of problem-solution pairs by plot type with different layout customizations

I Introduction to R Graphics with ggplot2http://en.slideshare.net/izahn/rgraphics-12040991→ Introductory presentation with many examples

I ggplot2 Essentialshttp://www.sthda.com/english/wiki/ggplot2-essentials→ Overview of different plots and available options for customization

Books

I Wickham (2016). “ggplot2: Elegant Graphics for Data Analysis”, 2nd ed., Springer.

68Plotting: Wrap-Up