Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Customizing chartsAndrew Ba Tran
ContentsReordering chart labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Lollipop plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Saving ggplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Scales for color and fill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Themes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Your turn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
This is from the fourth chapter of learn.r-journalism.com.
Let’s bring that data back in again.library(readr)
ages
AmeliaAmerican Ganster
An Officer and a GentlemanApollo 13Arbitrage
Autumn in New YorkBatman & Robin
Bee SeasonBlade Runner
BlowCaptain Phillips
Cast AwayCharlie Wilson's War
ChocolatCloud Atlas
Dark ShadowsDejaVu
Edward ScissorhandsEmpire Strikes Back
Extremely Loud & Incredibly CloseFar and Away
FirewallFirst Knight
FlightForrest GumpHe Got Game
Intolerable CrueltyJerry Maguire
John QKnight and Day
LeatherheadsLosin It
Malcolm XMission Impossible III
Mo' Better BluesNights in Rodanthe
O Brother, Where Art Thou?Oblivion
Ocean's ElevenOne Fine Day
Out of SightOut of Time
Pirates of the CaribbeanPretty Woman
Raiders of the Lost ArkRandom Hearts
Regarding HenryRemember the Titans
Risky BusinessRock of Ages
SabrinaShall We Dance
Six Days Seven NightsSleepless in Seattle
Sleepy HollowSolaris
SommersbySplash
The AmericanThe Good German
The HoaxThe Money Pit
The Preacher's WifeThe Rum Diary
The TouristTop Gun
Training DayTranscendence
Up in the AirVanilla Sky
What Lies BeneathWhat's Eating Gilbert Grape
WitnessWorking Girl
20 30 40 50
actress_age
Mov
ie
It’s not that great, right? It’s in reverse alphabetical order.
Let’s reorder it based on age.
Reordering chart labels
This means we need to transform the data.
The easiest way to do this is with the package forcats, which (surprise!) is also part of the tidyverse universe.
The function is fct_reorder() and it works like this
2
https://forcats.tidyverse.org/index.html
# If you don't have forcats installed yet, uncomment the line below and run# install.packages("forcats")
library(forcats)ggplot(ages,
aes(x=actress_age, y=fct_reorder(Movie, actress_age, desc=TRUE))) +geom_point()
3
Edward ScissorhandsSleepy Hollow
What's Eating Gilbert GrapePirates of the Caribbean
Blade RunnerHe Got GamePretty Woman
Empire Strikes BackRisky Business
SplashDark ShadowsFar and Away
The Rum DiaryBlow
An Officer and a GentlemanTraining Day
Vanilla SkyWitness
Autumn in New YorkForrest GumpJerry Maguire
Mo' Better BluesOut of Sight
Top GunAmerican Ganster
Raiders of the Lost ArkRemember the Titans
Six Days Seven NightsDejaVu
First KnightMission Impossible III
SabrinaSommersby
TranscendenceSleepless in Seattle
Working GirlOut of Time
SolarisBatman & Robin
Losin ItOblivion
Regarding HenryThe Preacher's Wife
ArbitrageIntolerable Cruelty
John QMalcolm X
Ocean's ElevenRock of Ages
The AmericanAmelia
FlightThe Tourist
ChocolatThe Hoax
The Money PitUp in the Air
Cast AwayKnight and Day
The Good GermanLeatherheadsOne Fine Day
Random HeartsApollo 13
Bee SeasonCharlie Wilson's War
O Brother, Where Art Thou?What Lies BeneathNights in Rodanthe
FirewallCloud Atlas
Extremely Loud & Incredibly CloseCaptain PhillipsShall We Dance
20 30 40 50
actress_age
fct_
reor
der(
Mov
ie, a
ctre
ss_a
ge, d
esc
= T
RU
E)
Not a bad looking chart. We can tweak it a little more and turn it into
Lollipop plot
This time we’re going to use a new geom_: geom_segment()ggplot(ages,
aes(x=actress_age, y=fct_reorder(Movie, actress_age, desc=TRUE))) +geom_segment(
aes(x = 0,xend = actress_age,yend = fct_reorder(Movie, actress_age, desc=TRUE)),color = "gray50") +geom_point()
4
Edward ScissorhandsSleepy Hollow
What's Eating Gilbert GrapePirates of the Caribbean
Blade RunnerHe Got GamePretty Woman
Empire Strikes BackRisky Business
SplashDark ShadowsFar and Away
The Rum DiaryBlow
An Officer and a GentlemanTraining Day
Vanilla SkyWitness
Autumn in New YorkForrest GumpJerry Maguire
Mo' Better BluesOut of Sight
Top GunAmerican Ganster
Raiders of the Lost ArkRemember the Titans
Six Days Seven NightsDejaVu
First KnightMission Impossible III
SabrinaSommersby
TranscendenceSleepless in Seattle
Working GirlOut of Time
SolarisBatman & Robin
Losin ItOblivion
Regarding HenryThe Preacher's Wife
ArbitrageIntolerable Cruelty
John QMalcolm X
Ocean's ElevenRock of Ages
The AmericanAmelia
FlightThe Tourist
ChocolatThe Hoax
The Money PitUp in the Air
Cast AwayKnight and Day
The Good GermanLeatherheadsOne Fine Day
Random HeartsApollo 13
Bee SeasonCharlie Wilson's War
O Brother, Where Art Thou?What Lies BeneathNights in Rodanthe
FirewallCloud Atlas
Extremely Loud & Incredibly CloseCaptain PhillipsShall We Dance
0 20 40 60
actress_age
fct_
reor
der(
Mov
ie, a
ctre
ss_a
ge, d
esc
= T
RU
E)
Looking interesting, right?
If we wanted to publish this on a website or share on social media, we’ll need to clean up the labels and adda title and add a source line.
That’s easy to do.ggplot(ages,
aes(x=actress_age, y=fct_reorder(Movie, actress_age, desc=TRUE))) +geom_segment(
aes(x = 0,y=fct_reorder(Movie, actress_age, desc=TRUE),xend = actress_age,yend = fct_reorder(Movie, actress_age, desc=TRUE)),color = "gray50") +geom_point() +
5
# NEW CODE BELOWlabs(x="Actress age", y="Movie",
title = "Actress ages in movies",subtitle = "for R for Journalists class",caption = "Data from Vulture.com and IMDB") +
theme_minimal()
Edward ScissorhandsSleepy Hollow
What's Eating Gilbert GrapePirates of the Caribbean
Blade RunnerHe Got GamePretty Woman
Empire Strikes BackRisky Business
SplashDark ShadowsFar and Away
The Rum DiaryBlow
An Officer and a GentlemanTraining Day
Vanilla SkyWitness
Autumn in New YorkForrest GumpJerry Maguire
Mo' Better BluesOut of Sight
Top GunAmerican Ganster
Raiders of the Lost ArkRemember the Titans
Six Days Seven NightsDejaVu
First KnightMission Impossible III
SabrinaSommersby
TranscendenceSleepless in Seattle
Working GirlOut of Time
SolarisBatman & Robin
Losin ItOblivion
Regarding HenryThe Preacher's Wife
ArbitrageIntolerable Cruelty
John QMalcolm X
Ocean's ElevenRock of Ages
The AmericanAmelia
FlightThe Tourist
ChocolatThe Hoax
The Money PitUp in the Air
Cast AwayKnight and Day
The Good GermanLeatherheadsOne Fine Day
Random HeartsApollo 13
Bee SeasonCharlie Wilson's War
O Brother, Where Art Thou?What Lies BeneathNights in Rodanthe
FirewallCloud Atlas
Extremely Loud & Incredibly CloseCaptain PhillipsShall We Dance
0 20 40 60
Actress age
Mov
ie
for R for Journalists class
Actress ages in movies
Data from Vulture.com and IMDB
So we added a lot of information to the labs() function: x, y, title, subtitle, and caption.
We also added theme_minimal() which changed a lot of the style, such as the gray grid background.
What if we wanted to clean it up even more?
It’s such a tall chart, it’s difficult to keep track of the actual age represented by the lollipop.
Let’s get rid of the grids and add the numbers to the right of each dot.
6
ggplot(ages,aes(x=actress_age, y=fct_reorder(Movie, actress_age, desc=TRUE))) +
geom_segment(aes(x = 0,
y=fct_reorder(Movie, actress_age, desc=TRUE),xend = actress_age,yend = fct_reorder(Movie, actress_age, desc=TRUE)),color = "gray50") +geom_point() +
labs(x="Actress age", y="Movie",title = "Actress ages in movies",subtitle = "for R for Journalists class",caption = "Data from Vulture.com and IMDB") +
theme_minimal() +# NEW CODE BELOWgeom_text(aes(label=actress_age), hjust=-.5) +theme(panel.border = element_blank(),
panel.grid.major = element_blank(),panel.grid.minor = element_blank(),axis.line = element_blank(),axis.text.x = element_blank())
7
28
34
33
22
29
27
34
32
30
29
35
23
29
22
27
31
33
30
29
39
4244
1920
19
36
26
21
35
2524
30
33
23
28
24
28
27
30
37
34
33
38
33
28
42
34
32
34
3738
36
34
27
22
30
30
28
58
41
36
43
35
34
23
36
31
28
40
37
41
4746
54
Edward ScissorhandsSleepy Hollow
What's Eating Gilbert GrapePirates of the Caribbean
Blade RunnerHe Got GamePretty Woman
Empire Strikes BackRisky Business
SplashDark ShadowsFar and Away
The Rum DiaryBlow
An Officer and a GentlemanTraining Day
Vanilla SkyWitness
Autumn in New YorkForrest GumpJerry Maguire
Mo' Better BluesOut of Sight
Top GunAmerican Ganster
Raiders of the Lost ArkRemember the Titans
Six Days Seven NightsDejaVu
First KnightMission Impossible III
SabrinaSommersby
TranscendenceSleepless in Seattle
Working GirlOut of Time
SolarisBatman & Robin
Losin ItOblivion
Regarding HenryThe Preacher's Wife
ArbitrageIntolerable Cruelty
John QMalcolm X
Ocean's ElevenRock of Ages
The AmericanAmelia
FlightThe Tourist
ChocolatThe Hoax
The Money PitUp in the Air
Cast AwayKnight and Day
The Good GermanLeatherheadsOne Fine Day
Random HeartsApollo 13
Bee SeasonCharlie Wilson's War
O Brother, Where Art Thou?What Lies BeneathNights in Rodanthe
FirewallCloud Atlas
Extremely Loud & Incredibly CloseCaptain PhillipsShall We Dance
Actress age
Mov
ie
for R for Journalists class
Actress ages in movies
Data from Vulture.com and IMDB
So, we added two new ggplot2 elements: geom_text() and theme().
We passed the actress_age variable to label and also used hjust= which means horizontally adjust thelocation. Alternatively, vjust would adjust vertically.
In theme() there are a bunch of things passed, including panel.border and axis.text.x and made them equalelement_blank().
Each piece of the chart can be customized and eliminated with *element_blank().
Not bad looking!
Let’s save it.
8
Saving ggplots
We’ll use ggsave() from the ggplot2 package.
File types that can be exported:
• png• tex• pdf• jpeg• tiff• bmp• svg
You can specify the width of the image in units of “in”, “cm”, “or mm”.
Otherwise it saves based on the size of how it displayed on your screen.ggsave("actress_ages.png")
## Saving 6.5 x 4.5 in image
How’s it look?
Ew, okay. Needs some adjustment. I guess we can’t go with the default display for this particular chart.ggsave("actress_ages_adjusted.png", width=20, height=30, units="cm")
9
10
Much better!
You could then save it as a .svg file and tweak it even further in Adobe Illustrator or Inkscape.
Alright, I’m going to tweak it some more by adding actor ages. We just need to adjust the geom_segment()and another geom_point() layer so it uses the actor_age variable.# First, let's permanently reorder the data frame so we don't have to keep using fct_reorder
library(dplyr)
ages_reordered %mutate(Movie=fct_reorder(Movie, desc(actor_age)))
ggplot(ages_reordered) +geom_segment(
aes(x = actress_age,y = Movie,xend = actor_age,yend = Movie),color = "gray50") +
geom_point(aes(x=actress_age, y=Movie), color="dark green") +geom_point(aes(x=actor_age, y=Movie), color="dark blue") +labs(x="", y="",
title = "Actor and actress ages in movies",subtitle = "for R for Journalists class",caption = "Data from Vulture.com and IMDB") +
theme_minimal() +geom_text(aes(x=actress_age, y=Movie, label=actress_age),
hjust=ifelse(ages$actress_age
28
34
33
22
29
27
34
32
3029
35
23
29
22
27
31
33
3029
39
42
44
19
20
19
3626
21
35
25
2430
3323
28
24
28
27
30
37
3433
38
3328
42
3432
34
37
38
36
34
27
22
30
30
28
58
4136
4335
34
2336
31
28
40
37
41
4746
54
35
37
41
43
45
46
47
48
5152
57
37
38
39
42
46
48
5355
57
58
63
27
30
36
3737
43
47
48
4849
2021
23
29
34
39
43
47
5050
35
3637
39
4041
42
45
46
48
49
32
40
43
45
50
55
5657
5960
63
2729
36
37
38
44
51
5556
57
ArbitrageFirewallAmelia
Nights in RodantheWhat Lies Beneath
Captain PhillipsFlight
Random HeartsThe Hoax
Bee SeasonCloud Atlas
Extremely Loud & Incredibly CloseShall We Dance
Six Days Seven NightsSabrina
American GansterCharlie Wilson's War
DejaVuAutumn in New York
OblivionRock of Ages
The AmericanTranscendenceDark Shadows
Out of TimeRegarding Henry
The Rum DiaryUp in the Air
John QKnight and Day
The TouristLeatherheads
Training DayWorking GirlFirst Knight
Remember the TitansThe Good German
Cast AwayHe Got Game
Mission Impossible IIIPirates of the Caribbean
SommersbyIntolerable Cruelty
WitnessSolaris
The Preacher's WifeOcean's Eleven
Pretty WomanBlade Runner
O Brother, Where Art Thou?Vanilla Sky
Apollo 13Raiders of the Lost Ark
BlowChocolat
Empire Strikes BackForrest Gump
Malcolm XOut of Sight
Batman & RobinSleepless in Seattle
Sleepy HollowMo' Better Blues
One Fine DayJerry Maguire
An Officer and a GentlemanWhat's Eating Gilbert Grape
Far and AwayThe Money Pit
Edward ScissorhandsSplash
Top GunRisky Business
Losin It
for R for Journalists class
Actor and actress ages in movies
Data from Vulture.com and IMDB
This time I left the x and y axis labels blank because it seemed redundant.
Scales
Let’s talk about scales.
Axes
12
• scale_x_continuous()• scale_y_continuous()• scale_x_discrete()• scale_y_discrete()
Colors
• scale_color_continuous()• scale_color_manual()• scale_color_brewer()
Fill
• scale_fill_continuous()• scale_fill_manual()
ggplot(ages, aes(x=actor_age, y=actress_age)) + geom_point() +scale_x_continuous(breaks=seq(20,30,2), limits=c(20,30)) +scale_y_continuous(breaks=seq(20,40,4), limits=c(20,40))
## Warning: Removed 67 rows containing missing values (geom_point).
20
24
28
32
36
40
20 22 24 26 28 30
actor_age
actr
ess_
age
By setting breaks in scale_x_continuous(), we limited the breaks where the chart was divided on the xaxis in intervals of 2. And we limited the x axis with limit between 20 and 30. All other data points weredropped.
By setting breaks in scale_y_continuous(), we limited the breaks where the chart was divided on the xaxis in intervals of 4. And we limited the x axis with limit between 20 and 40. All other data points weredropped.
That was limiting the scale by continuous data.
13
Here’s how to set limits on discrete data.ggplot(ages, aes(x=actor)) + geom_bar() +
scale_x_discrete(limits=c("Tom Hanks", "Tom Cruise", "Denzel Washington"))
## Warning: Removed 43 rows containing non-finite values (stat_count).
0
3
6
9
Tom Hanks Tom Cruise Denzel Washington
actor
coun
t
Scales for color and fill
It’s possible to manually change the colors of your chart.
You can use hex symbols or the name of a color if it’s recognized.
We’ll use scale_fill_manual().library(dplyr)
avg_age %group_by(actor) %>%mutate(age_diff = actor_age-actress_age) %>%summarize(average_age_diff = mean(age_diff))
ggplot(avg_age, aes(x=actor, y=average_age_diff, fill=actor)) +geom_bar(stat="identity") +theme(legend.position="none") + # This removes the legendscale_fill_manual(values=c("aquamarine", "darkorchid", "deepskyblue2", "lemonchiffon2",
"orange", "peachpuff3", "tomato"))
14
http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf
0
5
10
15
Denzel Washington George Clooney Harrison Ford Johnny Depp Richard Gere Tom Cruise Tom Hanks
actor
aver
age_
age_
diff
You can also specify a color palette using scale_fill_brewer() or scale_color_brewer()ggplot(avg_age, aes(x=actor, y=average_age_diff, fill=actor)) +
geom_bar(stat="identity") +theme(legend.position="none") +scale_fill_brewer()
0
5
10
15
Denzel Washington George Clooney Harrison Ford Johnny Depp Richard Gere Tom Cruise Tom Hanks
actor
aver
age_
age_
diff
Check out some of the other palette options that can be passed to brewer.ggplot(avg_age, aes(x=actor, y=average_age_diff, fill=actor)) +
geom_bar(stat="identity") +theme(legend.position="none") +
15
https://learnr.wordpress.com/2009/04/15/ggplot2-qualitative-colour-palettes/
scale_fill_brewer(palette="Pastel1")
0
5
10
15
Denzel Washington George Clooney Harrison Ford Johnny Depp Richard Gere Tom Cruise Tom Hanks
actor
aver
age_
age_
diff
Did you know that someone made a Wes Anderson color palette package based on his different movies?
Annotations
You can annotate charts with annotate() and geom_hline() or geom_vline().ggplot(ages, aes(x=actor_age, y=actress_age)) +
geom_point() +geom_hline(yintercept=50, color="red") +annotate("text", x=40, y=51, label="Random text for some reason", color="red")
16
https://github.com/karthik/wesanderson
Random text for some reason
20
30
40
50
20 30 40 50 60
actor_age
actr
ess_
age
Themes
You’ve seen an example of a theme used in a previous chart. theme_bw().
But there are many more that have been created and collected into the ggthemes library.
Here’s one for the economist# If you don't have ggthemes installed yet, uncomment the line below and run it#install.packages("ggthemes")
library(ggthemes)ggplot(ages, aes(x=actor_age, y=actress_age, color=actor)) +
geom_point() +theme_economist() +scale_colour_economist()
17
20
30
40
50
20 30 40 50 60actor_age
actr
ess_
age
actorDenzel WashingtonGeorge Clooney
Harrison FordJohnny Depp
Richard GereTom Cruise
Tom Hanks
Here’s one based on FiveThirtyEight’s style (though it’s not the official one).ggplot(ages, aes(x=actor_age, y=actress_age, color=actor)) +
geom_point() +theme_fivethirtyeight()
18
20
30
40
50
20 30 40 50 60
actorDenzel Washington
George Clooney
Harrison Ford
Johnny Depp
Richard Gere
Tom Cruise
Tom Hanks
Check out all the other ones currently available.
It’s not difficult to make your own. It’s just time consuming.
It involves tweaking every little detail, like text, and colors, and how the grids should look.
Check out the theme that the Associated Press uses. They posted it on their repo and by loading their ownpackage, they can just add theme_ap() at the end of their charts to transform it to AP style.
Your turn
Challenge yourself with these exercises so you’ll retain the knowledge of this section.
Instructions on how to run the exercise app are on the intro page to this section.
19
https://cran.r-project.org/web/packages/ggthemes/vignettes/ggthemes.htmlhttps://github.com/associatedpress/aptheme/blob/master/R/theme_ap.Rhttps://github.com/associatedpress/apthemehttp://code.r-journalism.com/chapter-4/#section-customizing-chartshttps://learn.r-journalism.com/en/visualizing/
Reordering chart labelsLollipop plotSaving ggplotsScalesScales for color and fillAnnotationsThemesYour turn