R package ggplot2 - Gaston Sanchez · R package ggplot2 STAT 133 Gaston Sanchez Department of...

Preview:

Citation preview

R package ggplot2STAT 133

Gaston Sanchez

Department of Statistics, UC–Berkeley

gastonsanchez.com

github.com/gastonstat/stat133

Course web: gastonsanchez.com/stat133

ggplot2

2

Scatterplot with "ggplot2"

Terminology

I aesthetic mappings

I geometric objects

I statistical transformations

I scales

I non-data elements (themes & elements)

I facets

3

Considerations

Specifying graphical elements from 3 sources:

I The data values (represented by the geometric objects)

I The scales and coordinate system (axes, legends)

I Plot annotations (background, title, grid lines)

4

Scatterplot with geom point

ggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_point()

●●

●●

● ●●

●●

100

200

300

10 15 20 25 30 35mpg

hp

5

Another geom

ggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_line()

100

200

300

10 15 20 25 30 35mpg

hp

6

Mapping Attributes-vs-

Setting Attributes

7

Increase size of points

ggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_point(size = 3)

●●

●●

● ●●

●●

●●

●●

●100

200

300

10 15 20 25 30 35mpg

hp

8

How does it work?

To increase the size of points, we set the aesthetic size to aconstant value of 3 (inside the geoms function):

+ geom_point(size = 3)

9

Adding color

ggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_point(size = 3, color = "tomato")

●●

●●

● ●●

●●

●●

●●

●100

200

300

10 15 20 25 30 35mpg

hp

10

Adding color

ggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_point(size = 3, color = "#259ff8")

●●

●●

● ●●

●●

●●

●●

●100

200

300

10 15 20 25 30 35mpg

hp

11

Test your knowledge

Identify the valid hex-color

A) "345677"

B) "#1234567"

C) "#AAAAAA"

D) "#GG0033"

12

Changing points shape# 'shape' accepts 'pch' values

ggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_point(size = 3, color = "tomato", shape = 15)

100

200

300

10 15 20 25 30 35mpg

hp

13

Setting and Mapping

Aesthetic attributes can be either mapped —via aes()— orset

# mapping aesthetic color

ggplot(mtcars, aes(x = mpg, y = hp)) +

geom_point(aes(color = cyl))

# setting aesthetic color

ggplot(mtcars, aes(x = mpg, y = hp)) +

geom_point(color = "blue")

14

Geom text, and mapping labels

ggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_text(aes(label = gear))

444

3

3

3

3

4

4

44

3 3333

3

44

4

3

33

3

3

45

5

5

5

5

4100

200

300

10 15 20 25 30 35mpg

hp

15

Changing axis labels and title

ggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_point(size = 3, color = "tomato") +

xlab("miles per gallon") +

ylab("horse power") +

ggtitle("Scatter plot with ggplot2")

●●

●●

● ●●

●●

●●

●●

●100

200

300

10 15 20 25 30 35miles per gallon

ho

rse

pow

er

Scatter plot with ggplot2

16

Changing background themeggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_point(size = 3, color = "tomato") +

xlab("miles per gallon") +

ylab("horse power") +

ggtitle("Scatter plot with ggplot2") +

theme_bw()

●●

●●

● ●●

●●

●●

●●

●100

200

300

10 15 20 25 30 35miles per gallon

ho

rse

pow

er

Scatter plot with ggplot2

17

Your turn: Replicate this figure

100

200

300

10 15 20 25 30 35miles per gallon

ho

rse

pow

er disp

100

200

300

400

18

Your turn: Replicate this figure

I Specify a color in hex notation

I Change the shape of the point symbol

I Map disp to attribute size of points

I Add axis labels

19

Your turn

ggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_point(aes(size = disp),

color = "#ff6666", shape = 17) +

xlab("miles per gallon") +

ylab("horse power")

20

More geomsggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_point() +

geom_smooth(method = "lm")

●●

●●

● ●●

●●

●●

0

100

200

300

10 15 20 25 30 35mpg

hp

21

More geomsWe can map variable to a color aesthetic. Here we map colorto cyl (cylinders)

ggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_point(aes(color = cyl))

●●

●●

● ●●

●●

100

200

300

10 15 20 25 30 35mpg

hp

4

5

6

7

8cyl

22

More geomsIf the variable that maps to color is a factor, then the colorscale will change

ggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_point(aes(color = as.factor(cyl)))

●●

●●

● ●●

●●

100

200

300

10 15 20 25 30 35mpg

hp

as.factor(cyl)

4

6

8

23

Your turn: Replicate this figure

100

200

300

400

10 15 20 25 30 35miles per gallon

dis

pla

cem

en

t

factor(am)

0

1

hp

100

150

200

250

300

Scatter plot with ggplot2

24

Your turn: example 2

I Map hp to attribute size of points

I Map am (as factor) to attribute color points

I Add an alpha transparency of 0.7

I Change the shape of the point symbol

I Add axis labels

I Add a title

25

Your turn: example 2

ggplot(data = mtcars, aes(x = mpg, y = disp)) +

geom_point(aes(size = hp, color = factor(am)),

alpha = 0.7) +

xlab("miles per gallon") +

ylab("displacement") +

ggtitle("Scatter plot with ggplot2")

26

Histogram

ggplot(data = mtcars, aes(x = mpg)) +

geom_histogram(binwidth = 2)

0

2

4

6

10 20 30mpg

coun

t

27

Boxplots

ggplot(data = mtcars, aes(x = factor(cyl), y = mpg)) +

geom_boxplot()

●●

10

15

20

25

30

35

4 6 8factor(cyl)

mpg

28

Density Curves

ggplot(data = mtcars, aes(x = mpg)) +

geom_density()

0.00

0.02

0.04

0.06

10 15 20 25 30 35mpg

dens

ity

29

Density Curves

ggplot(data = mtcars, aes(x = mpg)) +

geom_density(fill = "#c6b7f5")

0.00

0.02

0.04

0.06

10 15 20 25 30 35mpg

dens

ity

30

Density Curves

ggplot(data = mtcars, aes(x = mpg)) +

geom_density(fill = "#c6b7f5", alpha = 0.4)

0.00

0.02

0.04

0.06

10 15 20 25 30 35mpg

dens

ity

31

Density Curves

ggplot(data = mtcars, aes(x = mpg)) +

geom_line(stat = 'density', col = "#a868c0", size = 2)

0.02

0.03

0.04

0.05

0.06

0.07

10 15 20 25 30 35mpg

dens

ity

32

Density Curves

ggplot(data = mtcars, aes(x = mpg)) +

geom_density(fill = '#a868c0') +

geom_line(stat = 'density', col = "#a868c0", size = 2)

0.00

0.02

0.04

0.06

10 15 20 25 30 35mpg

dens

ity

33

ggplot objects

34

Plot objects

You can assign a plot to a new object (this won’t plotanything):

mpg_hp <- ggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_point(size = 3, color = "tomato")

To show the actual plot associated to the object mpg hp use thefunction print()

print(mpg_hp)

35

"ggplot2" objects

working with ggplot objects, we can ...I define a basic plot, to which we can add or change layers

without typing everything again

I render it on screen with print()

I describe its structure with summary()

I render it to disk with ggsave()

I save a cached copy to disk with save()

36

Adding a title and axis labels to a ggplot2 object:

mpg_hp + ggtitle("Scatter plot with ggplot2") +

xlab("miles per gallon") + ylab("horse power")

●●

●●

● ●●

●●

●●

●●

●100

200

300

10 15 20 25 30 35miles per gallon

ho

rse

pow

er

Scatter plot with ggplot2

37

Your turn: example 3

Create the following ggplot object:

# ggplot object

obj <- ggplot(data = mtcars,

aes(x = mpg, y = hp, label = rownames(mtcars)))

Add more layers to the object "”obj” in order to replicate thefigure in the following slide:

38

Your turn: example 3

Mazda RX4Mazda RX4 WagDatsun 710

Hornet 4 Drive

Hornet Sportabout

Valiant

Duster 360

Merc 240D

Merc 230

Merc 280Merc 280C

Merc 450SEMerc 450SLMerc 450SLC

Cadillac FleetwoodLincoln Continental

Chrysler Imperial

Fiat 128Honda Civic

Toyota Corolla

Toyota Corona

Dodge ChallengerAMC Javelin

Camaro Z28

Pontiac Firebird

Fiat X1−9

Porsche 914−2

Lotus Europa

Ford Pantera L

Ferrari Dino

Maserati Bora

Volvo 142E100

200

300

10 15 20 25 30 35miles per gallon

hors

e po

wer

factor(am)

aa

0

1

Scatter plot

39

Your turn: example 3

obj +

geom_text(aes(color = factor(am))) +

ggtitle("Scatter plot") +

xlab("miles per gallon") +

ylab("horse power")

40

Scales

41

Scales

I The scales component encompases the ideas of both axesand legends on plots, e.g.:

I Axes can be continuous or discreteI Legends involve colors, symbol shapes, size, etc

– scale x continuous

– scale y continuous

– scale color manual

I scales will often automatically generate appropriate scalesfor plots

I Explicitly adding a scale component overrides the defaultscale

42

Continuous axis scales

Use scale x continuous() to modify the default values in thex axis

ggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_point(aes(color = factor(am))) +

scale_x_continuous(name = "miles per gallon",

limits = c(10, 40),

breaks = c(10, 20, 30, 40))

43

Continuous axis scales

●●

●●

● ●●

●●

100

200

300

10 20 30 40miles per gallon

hp

factor(am)

0

1

44

Continuous axis scales

Use scale y continuous() to modify the default values in they axis

ggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_point(aes(color = factor(am))) +

scale_x_continuous(name = "miles per gallon",

limits = c(10, 40),

breaks = c(10, 20, 30, 40)) +

scale_y_continuous(name = "horsepower",

limits = c(50, 350),

breaks = seq(50, 350, by = 50))

45

Continuous axis scales

●●

●●

● ●●

●●

50

100

150

200

250

300

350

10 20 30 40miles per gallon

hors

epow

er factor(am)

0

1

46

Example: color scale

Use scale color manual() to modify the colors associated toa factor

ggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_point(aes(color = factor(am))) +

scale_color_manual(values = c("orange", "purple"))

47

Example: color scale

●●

●●

● ●●

●●

100

200

300

10 15 20 25 30 35mpg

hp

factor(am)

0

1

48

Example: modifying legend

Modifying legends depends on the type of scales (e.g. color,shapes, size, etc)

ggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_point(aes(color = factor(am))) +

scale_color_manual(values = c("orange", "purple"),

name = "transmission",

labels = c('no', 'yes'))

49

Example: modifying legend

●●

●●

● ●●

●●

100

200

300

10 15 20 25 30 35mpg

hp

transmission

no

yes

50

Faceting

51

Faceting with facet wrap()

ggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_point(color = "#3088f0") +

facet_wrap(~ cyl)

●● ●●●

●●

● ●

●●●

●●

4 6 8

100

200

300

10 15 20 25 30 3510 15 20 25 30 3510 15 20 25 30 35mpg

hp

52

Faceting with facet grid()

ggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_point(color = "#3088f0") +

facet_grid(cyl ~ .)

●●

●●

●●

●● ●●●●

● ●●●●

●●

100

200

300

100

200

300

100

200

300

46

8

10 15 20 25 30 35mpg

hp

53

Faceting with facet grid()

ggplot(data = mtcars, aes(x = mpg, y = hp)) +

geom_point(color = "#3088f0") +

facet_grid(. ~ cyl)

4 6 8

●● ●●●

●●

● ●

●●●

●●

100

200

300

10 15 20 25 30 3510 15 20 25 30 3510 15 20 25 30 35mpg

hp

54

Layered Grammar

About "ggplot2"

I Key concept: layer (layered grammar of graphics)

I Designed to work in a layered fashion

I Starting with a layer showing the data

I Then adding layers of annotations and statisticaltransformations

I Core idea: independents components combined togehter

55

Some Concepts

I the data to be visualized

I a set of aesthetic mappings describing how varibales aremapped to aesthetic attributes

I geometric objects, geoms, representing what you see onthe plot (points, lines, etc)

I statistical transformations, stats, summarizing data invarious ways

I scales that map values in the data space to values in anaesthetic space

I a coordinate system, coord, describing how datacoordinates are mapped to the plane of the graphic

I a faceting specification describing how to break up thedata into subsets and to displays those subsets

56

Recommended