19
Customizing charts Andrew Ba Tran Contents Reordering chart labels .......................................... 2 Lollipop plot ................................................ 4 Saving ggplots ............................................... 9 Scales .................................................... 12 Scales for color and fill ........................................... 14 Annotations ................................................. 16 Themes ................................................... 17 Your turn .................................................. 19 This is from the fourth chapter of learn.r-journalism.com. Let’s bring that data back in again. library(readr) ages <- read_csv("data/ages.csv") Remember that Dot Plot we made before? library(ggplot2) ggplot(ages, aes(x=actress_age, y=Movie)) + geom_point() 1

Customizing charts - Journalism Courses Knight Center...Jerry Maguire An Officer and a Gentleman What's Eating Gilbert Grape Far and Away The Money Pit Edward Scissorhands Splash Top

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

  • Customizing chartsAndrew Ba Tran

    ContentsReordering chart labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Lollipop plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Saving ggplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Scales for color and fill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Themes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Your turn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    This is from the fourth chapter of learn.r-journalism.com.

    Let’s bring that data back in again.library(readr)

    ages

  • AmeliaAmerican Ganster

    An Officer and a GentlemanApollo 13Arbitrage

    Autumn in New YorkBatman & Robin

    Bee SeasonBlade Runner

    BlowCaptain Phillips

    Cast AwayCharlie Wilson's War

    ChocolatCloud Atlas

    Dark ShadowsDejaVu

    Edward ScissorhandsEmpire Strikes Back

    Extremely Loud & Incredibly CloseFar and Away

    FirewallFirst Knight

    FlightForrest GumpHe Got Game

    Intolerable CrueltyJerry Maguire

    John QKnight and Day

    LeatherheadsLosin It

    Malcolm XMission Impossible III

    Mo' Better BluesNights in Rodanthe

    O Brother, Where Art Thou?Oblivion

    Ocean's ElevenOne Fine Day

    Out of SightOut of Time

    Pirates of the CaribbeanPretty Woman

    Raiders of the Lost ArkRandom Hearts

    Regarding HenryRemember the Titans

    Risky BusinessRock of Ages

    SabrinaShall We Dance

    Six Days Seven NightsSleepless in Seattle

    Sleepy HollowSolaris

    SommersbySplash

    The AmericanThe Good German

    The HoaxThe Money Pit

    The Preacher's WifeThe Rum Diary

    The TouristTop Gun

    Training DayTranscendence

    Up in the AirVanilla Sky

    What Lies BeneathWhat's Eating Gilbert Grape

    WitnessWorking Girl

    20 30 40 50

    actress_age

    Mov

    ie

    It’s not that great, right? It’s in reverse alphabetical order.

    Let’s reorder it based on age.

    Reordering chart labels

    This means we need to transform the data.

    The easiest way to do this is with the package forcats, which (surprise!) is also part of the tidyverse universe.

    The function is fct_reorder() and it works like this

    2

    https://forcats.tidyverse.org/index.html

  • # If you don't have forcats installed yet, uncomment the line below and run# install.packages("forcats")

    library(forcats)ggplot(ages,

    aes(x=actress_age, y=fct_reorder(Movie, actress_age, desc=TRUE))) +geom_point()

    3

  • Edward ScissorhandsSleepy Hollow

    What's Eating Gilbert GrapePirates of the Caribbean

    Blade RunnerHe Got GamePretty Woman

    Empire Strikes BackRisky Business

    SplashDark ShadowsFar and Away

    The Rum DiaryBlow

    An Officer and a GentlemanTraining Day

    Vanilla SkyWitness

    Autumn in New YorkForrest GumpJerry Maguire

    Mo' Better BluesOut of Sight

    Top GunAmerican Ganster

    Raiders of the Lost ArkRemember the Titans

    Six Days Seven NightsDejaVu

    First KnightMission Impossible III

    SabrinaSommersby

    TranscendenceSleepless in Seattle

    Working GirlOut of Time

    SolarisBatman & Robin

    Losin ItOblivion

    Regarding HenryThe Preacher's Wife

    ArbitrageIntolerable Cruelty

    John QMalcolm X

    Ocean's ElevenRock of Ages

    The AmericanAmelia

    FlightThe Tourist

    ChocolatThe Hoax

    The Money PitUp in the Air

    Cast AwayKnight and Day

    The Good GermanLeatherheadsOne Fine Day

    Random HeartsApollo 13

    Bee SeasonCharlie Wilson's War

    O Brother, Where Art Thou?What Lies BeneathNights in Rodanthe

    FirewallCloud Atlas

    Extremely Loud & Incredibly CloseCaptain PhillipsShall We Dance

    20 30 40 50

    actress_age

    fct_

    reor

    der(

    Mov

    ie, a

    ctre

    ss_a

    ge, d

    esc

    = T

    RU

    E)

    Not a bad looking chart. We can tweak it a little more and turn it into

    Lollipop plot

    This time we’re going to use a new geom_: geom_segment()ggplot(ages,

    aes(x=actress_age, y=fct_reorder(Movie, actress_age, desc=TRUE))) +geom_segment(

    aes(x = 0,xend = actress_age,yend = fct_reorder(Movie, actress_age, desc=TRUE)),color = "gray50") +geom_point()

    4

  • Edward ScissorhandsSleepy Hollow

    What's Eating Gilbert GrapePirates of the Caribbean

    Blade RunnerHe Got GamePretty Woman

    Empire Strikes BackRisky Business

    SplashDark ShadowsFar and Away

    The Rum DiaryBlow

    An Officer and a GentlemanTraining Day

    Vanilla SkyWitness

    Autumn in New YorkForrest GumpJerry Maguire

    Mo' Better BluesOut of Sight

    Top GunAmerican Ganster

    Raiders of the Lost ArkRemember the Titans

    Six Days Seven NightsDejaVu

    First KnightMission Impossible III

    SabrinaSommersby

    TranscendenceSleepless in Seattle

    Working GirlOut of Time

    SolarisBatman & Robin

    Losin ItOblivion

    Regarding HenryThe Preacher's Wife

    ArbitrageIntolerable Cruelty

    John QMalcolm X

    Ocean's ElevenRock of Ages

    The AmericanAmelia

    FlightThe Tourist

    ChocolatThe Hoax

    The Money PitUp in the Air

    Cast AwayKnight and Day

    The Good GermanLeatherheadsOne Fine Day

    Random HeartsApollo 13

    Bee SeasonCharlie Wilson's War

    O Brother, Where Art Thou?What Lies BeneathNights in Rodanthe

    FirewallCloud Atlas

    Extremely Loud & Incredibly CloseCaptain PhillipsShall We Dance

    0 20 40 60

    actress_age

    fct_

    reor

    der(

    Mov

    ie, a

    ctre

    ss_a

    ge, d

    esc

    = T

    RU

    E)

    Looking interesting, right?

    If we wanted to publish this on a website or share on social media, we’ll need to clean up the labels and adda title and add a source line.

    That’s easy to do.ggplot(ages,

    aes(x=actress_age, y=fct_reorder(Movie, actress_age, desc=TRUE))) +geom_segment(

    aes(x = 0,y=fct_reorder(Movie, actress_age, desc=TRUE),xend = actress_age,yend = fct_reorder(Movie, actress_age, desc=TRUE)),color = "gray50") +geom_point() +

    5

  • # NEW CODE BELOWlabs(x="Actress age", y="Movie",

    title = "Actress ages in movies",subtitle = "for R for Journalists class",caption = "Data from Vulture.com and IMDB") +

    theme_minimal()

    Edward ScissorhandsSleepy Hollow

    What's Eating Gilbert GrapePirates of the Caribbean

    Blade RunnerHe Got GamePretty Woman

    Empire Strikes BackRisky Business

    SplashDark ShadowsFar and Away

    The Rum DiaryBlow

    An Officer and a GentlemanTraining Day

    Vanilla SkyWitness

    Autumn in New YorkForrest GumpJerry Maguire

    Mo' Better BluesOut of Sight

    Top GunAmerican Ganster

    Raiders of the Lost ArkRemember the Titans

    Six Days Seven NightsDejaVu

    First KnightMission Impossible III

    SabrinaSommersby

    TranscendenceSleepless in Seattle

    Working GirlOut of Time

    SolarisBatman & Robin

    Losin ItOblivion

    Regarding HenryThe Preacher's Wife

    ArbitrageIntolerable Cruelty

    John QMalcolm X

    Ocean's ElevenRock of Ages

    The AmericanAmelia

    FlightThe Tourist

    ChocolatThe Hoax

    The Money PitUp in the Air

    Cast AwayKnight and Day

    The Good GermanLeatherheadsOne Fine Day

    Random HeartsApollo 13

    Bee SeasonCharlie Wilson's War

    O Brother, Where Art Thou?What Lies BeneathNights in Rodanthe

    FirewallCloud Atlas

    Extremely Loud & Incredibly CloseCaptain PhillipsShall We Dance

    0 20 40 60

    Actress age

    Mov

    ie

    for R for Journalists class

    Actress ages in movies

    Data from Vulture.com and IMDB

    So we added a lot of information to the labs() function: x, y, title, subtitle, and caption.

    We also added theme_minimal() which changed a lot of the style, such as the gray grid background.

    What if we wanted to clean it up even more?

    It’s such a tall chart, it’s difficult to keep track of the actual age represented by the lollipop.

    Let’s get rid of the grids and add the numbers to the right of each dot.

    6

  • ggplot(ages,aes(x=actress_age, y=fct_reorder(Movie, actress_age, desc=TRUE))) +

    geom_segment(aes(x = 0,

    y=fct_reorder(Movie, actress_age, desc=TRUE),xend = actress_age,yend = fct_reorder(Movie, actress_age, desc=TRUE)),color = "gray50") +geom_point() +

    labs(x="Actress age", y="Movie",title = "Actress ages in movies",subtitle = "for R for Journalists class",caption = "Data from Vulture.com and IMDB") +

    theme_minimal() +# NEW CODE BELOWgeom_text(aes(label=actress_age), hjust=-.5) +theme(panel.border = element_blank(),

    panel.grid.major = element_blank(),panel.grid.minor = element_blank(),axis.line = element_blank(),axis.text.x = element_blank())

    7

  • 28

    34

    33

    22

    29

    27

    34

    32

    30

    29

    35

    23

    29

    22

    27

    31

    33

    30

    29

    39

    4244

    1920

    19

    36

    26

    21

    35

    2524

    30

    33

    23

    28

    24

    28

    27

    30

    37

    34

    33

    38

    33

    28

    42

    34

    32

    34

    3738

    36

    34

    27

    22

    30

    30

    28

    58

    41

    36

    43

    35

    34

    23

    36

    31

    28

    40

    37

    41

    4746

    54

    Edward ScissorhandsSleepy Hollow

    What's Eating Gilbert GrapePirates of the Caribbean

    Blade RunnerHe Got GamePretty Woman

    Empire Strikes BackRisky Business

    SplashDark ShadowsFar and Away

    The Rum DiaryBlow

    An Officer and a GentlemanTraining Day

    Vanilla SkyWitness

    Autumn in New YorkForrest GumpJerry Maguire

    Mo' Better BluesOut of Sight

    Top GunAmerican Ganster

    Raiders of the Lost ArkRemember the Titans

    Six Days Seven NightsDejaVu

    First KnightMission Impossible III

    SabrinaSommersby

    TranscendenceSleepless in Seattle

    Working GirlOut of Time

    SolarisBatman & Robin

    Losin ItOblivion

    Regarding HenryThe Preacher's Wife

    ArbitrageIntolerable Cruelty

    John QMalcolm X

    Ocean's ElevenRock of Ages

    The AmericanAmelia

    FlightThe Tourist

    ChocolatThe Hoax

    The Money PitUp in the Air

    Cast AwayKnight and Day

    The Good GermanLeatherheadsOne Fine Day

    Random HeartsApollo 13

    Bee SeasonCharlie Wilson's War

    O Brother, Where Art Thou?What Lies BeneathNights in Rodanthe

    FirewallCloud Atlas

    Extremely Loud & Incredibly CloseCaptain PhillipsShall We Dance

    Actress age

    Mov

    ie

    for R for Journalists class

    Actress ages in movies

    Data from Vulture.com and IMDB

    So, we added two new ggplot2 elements: geom_text() and theme().

    We passed the actress_age variable to label and also used hjust= which means horizontally adjust thelocation. Alternatively, vjust would adjust vertically.

    In theme() there are a bunch of things passed, including panel.border and axis.text.x and made them equalelement_blank().

    Each piece of the chart can be customized and eliminated with *element_blank().

    Not bad looking!

    Let’s save it.

    8

  • Saving ggplots

    We’ll use ggsave() from the ggplot2 package.

    File types that can be exported:

    • png• tex• pdf• jpeg• tiff• bmp• svg

    You can specify the width of the image in units of “in”, “cm”, “or mm”.

    Otherwise it saves based on the size of how it displayed on your screen.ggsave("actress_ages.png")

    ## Saving 6.5 x 4.5 in image

    How’s it look?

    Ew, okay. Needs some adjustment. I guess we can’t go with the default display for this particular chart.ggsave("actress_ages_adjusted.png", width=20, height=30, units="cm")

    9

  • 10

  • Much better!

    You could then save it as a .svg file and tweak it even further in Adobe Illustrator or Inkscape.

    Alright, I’m going to tweak it some more by adding actor ages. We just need to adjust the geom_segment()and another geom_point() layer so it uses the actor_age variable.# First, let's permanently reorder the data frame so we don't have to keep using fct_reorder

    library(dplyr)

    ages_reordered %mutate(Movie=fct_reorder(Movie, desc(actor_age)))

    ggplot(ages_reordered) +geom_segment(

    aes(x = actress_age,y = Movie,xend = actor_age,yend = Movie),color = "gray50") +

    geom_point(aes(x=actress_age, y=Movie), color="dark green") +geom_point(aes(x=actor_age, y=Movie), color="dark blue") +labs(x="", y="",

    title = "Actor and actress ages in movies",subtitle = "for R for Journalists class",caption = "Data from Vulture.com and IMDB") +

    theme_minimal() +geom_text(aes(x=actress_age, y=Movie, label=actress_age),

    hjust=ifelse(ages$actress_age

  • 28

    34

    33

    22

    29

    27

    34

    32

    3029

    35

    23

    29

    22

    27

    31

    33

    3029

    39

    42

    44

    19

    20

    19

    3626

    21

    35

    25

    2430

    3323

    28

    24

    28

    27

    30

    37

    3433

    38

    3328

    42

    3432

    34

    37

    38

    36

    34

    27

    22

    30

    30

    28

    58

    4136

    4335

    34

    2336

    31

    28

    40

    37

    41

    4746

    54

    35

    37

    41

    43

    45

    46

    47

    48

    5152

    57

    37

    38

    39

    42

    46

    48

    5355

    57

    58

    63

    27

    30

    36

    3737

    43

    47

    48

    4849

    2021

    23

    29

    34

    39

    43

    47

    5050

    35

    3637

    39

    4041

    42

    45

    46

    48

    49

    32

    40

    43

    45

    50

    55

    5657

    5960

    63

    2729

    36

    37

    38

    44

    51

    5556

    57

    ArbitrageFirewallAmelia

    Nights in RodantheWhat Lies Beneath

    Captain PhillipsFlight

    Random HeartsThe Hoax

    Bee SeasonCloud Atlas

    Extremely Loud & Incredibly CloseShall We Dance

    Six Days Seven NightsSabrina

    American GansterCharlie Wilson's War

    DejaVuAutumn in New York

    OblivionRock of Ages

    The AmericanTranscendenceDark Shadows

    Out of TimeRegarding Henry

    The Rum DiaryUp in the Air

    John QKnight and Day

    The TouristLeatherheads

    Training DayWorking GirlFirst Knight

    Remember the TitansThe Good German

    Cast AwayHe Got Game

    Mission Impossible IIIPirates of the Caribbean

    SommersbyIntolerable Cruelty

    WitnessSolaris

    The Preacher's WifeOcean's Eleven

    Pretty WomanBlade Runner

    O Brother, Where Art Thou?Vanilla Sky

    Apollo 13Raiders of the Lost Ark

    BlowChocolat

    Empire Strikes BackForrest Gump

    Malcolm XOut of Sight

    Batman & RobinSleepless in Seattle

    Sleepy HollowMo' Better Blues

    One Fine DayJerry Maguire

    An Officer and a GentlemanWhat's Eating Gilbert Grape

    Far and AwayThe Money Pit

    Edward ScissorhandsSplash

    Top GunRisky Business

    Losin It

    for R for Journalists class

    Actor and actress ages in movies

    Data from Vulture.com and IMDB

    This time I left the x and y axis labels blank because it seemed redundant.

    Scales

    Let’s talk about scales.

    Axes

    12

  • • scale_x_continuous()• scale_y_continuous()• scale_x_discrete()• scale_y_discrete()

    Colors

    • scale_color_continuous()• scale_color_manual()• scale_color_brewer()

    Fill

    • scale_fill_continuous()• scale_fill_manual()

    ggplot(ages, aes(x=actor_age, y=actress_age)) + geom_point() +scale_x_continuous(breaks=seq(20,30,2), limits=c(20,30)) +scale_y_continuous(breaks=seq(20,40,4), limits=c(20,40))

    ## Warning: Removed 67 rows containing missing values (geom_point).

    20

    24

    28

    32

    36

    40

    20 22 24 26 28 30

    actor_age

    actr

    ess_

    age

    By setting breaks in scale_x_continuous(), we limited the breaks where the chart was divided on the xaxis in intervals of 2. And we limited the x axis with limit between 20 and 30. All other data points weredropped.

    By setting breaks in scale_y_continuous(), we limited the breaks where the chart was divided on the xaxis in intervals of 4. And we limited the x axis with limit between 20 and 40. All other data points weredropped.

    That was limiting the scale by continuous data.

    13

  • Here’s how to set limits on discrete data.ggplot(ages, aes(x=actor)) + geom_bar() +

    scale_x_discrete(limits=c("Tom Hanks", "Tom Cruise", "Denzel Washington"))

    ## Warning: Removed 43 rows containing non-finite values (stat_count).

    0

    3

    6

    9

    Tom Hanks Tom Cruise Denzel Washington

    actor

    coun

    t

    Scales for color and fill

    It’s possible to manually change the colors of your chart.

    You can use hex symbols or the name of a color if it’s recognized.

    We’ll use scale_fill_manual().library(dplyr)

    avg_age %group_by(actor) %>%mutate(age_diff = actor_age-actress_age) %>%summarize(average_age_diff = mean(age_diff))

    ggplot(avg_age, aes(x=actor, y=average_age_diff, fill=actor)) +geom_bar(stat="identity") +theme(legend.position="none") + # This removes the legendscale_fill_manual(values=c("aquamarine", "darkorchid", "deepskyblue2", "lemonchiffon2",

    "orange", "peachpuff3", "tomato"))

    14

    http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf

  • 0

    5

    10

    15

    Denzel Washington George Clooney Harrison Ford Johnny Depp Richard Gere Tom Cruise Tom Hanks

    actor

    aver

    age_

    age_

    diff

    You can also specify a color palette using scale_fill_brewer() or scale_color_brewer()ggplot(avg_age, aes(x=actor, y=average_age_diff, fill=actor)) +

    geom_bar(stat="identity") +theme(legend.position="none") +scale_fill_brewer()

    0

    5

    10

    15

    Denzel Washington George Clooney Harrison Ford Johnny Depp Richard Gere Tom Cruise Tom Hanks

    actor

    aver

    age_

    age_

    diff

    Check out some of the other palette options that can be passed to brewer.ggplot(avg_age, aes(x=actor, y=average_age_diff, fill=actor)) +

    geom_bar(stat="identity") +theme(legend.position="none") +

    15

    https://learnr.wordpress.com/2009/04/15/ggplot2-qualitative-colour-palettes/

  • scale_fill_brewer(palette="Pastel1")

    0

    5

    10

    15

    Denzel Washington George Clooney Harrison Ford Johnny Depp Richard Gere Tom Cruise Tom Hanks

    actor

    aver

    age_

    age_

    diff

    Did you know that someone made a Wes Anderson color palette package based on his different movies?

    Annotations

    You can annotate charts with annotate() and geom_hline() or geom_vline().ggplot(ages, aes(x=actor_age, y=actress_age)) +

    geom_point() +geom_hline(yintercept=50, color="red") +annotate("text", x=40, y=51, label="Random text for some reason", color="red")

    16

    https://github.com/karthik/wesanderson

  • Random text for some reason

    20

    30

    40

    50

    20 30 40 50 60

    actor_age

    actr

    ess_

    age

    Themes

    You’ve seen an example of a theme used in a previous chart. theme_bw().

    But there are many more that have been created and collected into the ggthemes library.

    Here’s one for the economist# If you don't have ggthemes installed yet, uncomment the line below and run it#install.packages("ggthemes")

    library(ggthemes)ggplot(ages, aes(x=actor_age, y=actress_age, color=actor)) +

    geom_point() +theme_economist() +scale_colour_economist()

    17

  • 20

    30

    40

    50

    20 30 40 50 60actor_age

    actr

    ess_

    age

    actorDenzel WashingtonGeorge Clooney

    Harrison FordJohnny Depp

    Richard GereTom Cruise

    Tom Hanks

    Here’s one based on FiveThirtyEight’s style (though it’s not the official one).ggplot(ages, aes(x=actor_age, y=actress_age, color=actor)) +

    geom_point() +theme_fivethirtyeight()

    18

  • 20

    30

    40

    50

    20 30 40 50 60

    actorDenzel Washington

    George Clooney

    Harrison Ford

    Johnny Depp

    Richard Gere

    Tom Cruise

    Tom Hanks

    Check out all the other ones currently available.

    It’s not difficult to make your own. It’s just time consuming.

    It involves tweaking every little detail, like text, and colors, and how the grids should look.

    Check out the theme that the Associated Press uses. They posted it on their repo and by loading their ownpackage, they can just add theme_ap() at the end of their charts to transform it to AP style.

    Your turn

    Challenge yourself with these exercises so you’ll retain the knowledge of this section.

    Instructions on how to run the exercise app are on the intro page to this section.

    19

    https://cran.r-project.org/web/packages/ggthemes/vignettes/ggthemes.htmlhttps://github.com/associatedpress/aptheme/blob/master/R/theme_ap.Rhttps://github.com/associatedpress/apthemehttp://code.r-journalism.com/chapter-4/#section-customizing-chartshttps://learn.r-journalism.com/en/visualizing/

    Reordering chart labelsLollipop plotSaving ggplotsScalesScales for color and fillAnnotationsThemesYour turn