Graphiques avec ggplot2 - L3 -R3 · TheGrammarOfGraphics Thebasicideatobuildingplot •...

Preview:

Citation preview

GRAPHIQUES AVEC GGPLOT2L3 -R3

Julie Scholler - B246

novembre 2019

Graphics with ggplot2

Why?• elegant, polyvalent• mature and complete graphics system• very flexible• default behaviour carefully chosen• theme system for polishing plot appearance

How?grammar of graphics (Wilkinson, 2005)

The Grammar Of Graphics

The basic idea to building plot

• specify blocks/layers• combine them• get any kind of graphics

Blocks/layers

• data• aesthetic mapping• geometric object• statistical transformations• scales• coordinate system• position adjustments• faceting

Syntax

ggplot(data=...) + aes(x=..., y=...) + geom_...()

• Data: what is being visualized• Aesthetic Mappings: mappings between variables in the data

and components of the chart• Geometric Objects: geometric objects that are used to display

the data, such as points, lines, or shapes

First try

ggplot(data)

Aesthetic Mapping

In ggplot: aesthetic = “something you can see”

Examples• position (on the x and y axes)• color (“outside” color)• fill (“inside” color)• shape (of points)• linetype• size

Aesthetic mappings are set with the aes() function.

Second try

ggplot(data) + aes(x = note_totale)

25 50 75 100note_totale

Geometic Objects (geom)

Examples• points: geom_point• lines: geom_line• bar: geom_bar• histogram: geom_histogram• boxplot: geom_boxplot

List of available geometric objectsReference listhelp.search("geom_", package = "ggplot2")

Histogramm

ggplot(data) + aes(x = note_totale) + geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

0

5

10

25 50 75 100note_totale

coun

t

Create Good and Effective Graphics

• Labels+ labs(title=..., subtitle=..., caption=...,

x=..., y=..., color=..., etc.)

• Annotations+ geom_text()+ geom_text_repel()

• Coordinate+ coord_flip()

• Scales, Guides, Themes• Interactivity

ggplot(data = data) + aes(x=note_totale) +geom_histogram(bins = 15, fill="aquamarine3",

col="white") +labs(title = "Distribution des notes au QCM",

x = "Note", y = "Effectif")+theme_minimal()

0

5

10

15

20

25

25 50 75 100Note

Effe

ctif

Distribution des notes au QCM

Themes

The ggplot2 theme system handles non-data plot elements such as

• Axis labels• Plot background• Facet label backround• Legend appearance

Built-in themes include:

• theme_gray() (default)• theme_bw()• theme_classic()

Multivariateggplot(data = data) + aes(x=note_totale, fill=annee) +

geom_histogram(bins = 15, col="white", alpha=0.6) +labs(title = "Distribution des notes au QCM",

x = "Note", y = "Effectif") + theme_minimal()

0

5

10

15

20

25

25 50 75 100Note

Effe

ctif

annee

L1

L2

L3

Distribution des notes au QCM

Faceting

• Creates separate graphs for subsets of data• Two solutions

1. facet_wrap(): subsets as the levels of a single groupingvariable

2. facet_grid(): subsets as the crossing of two groupingvariables

• Facilitates comparison among plots

Syntax

ggplot(data=...) + aes(x=..., y=...,fill=...,color=...,group=...) +

geom_...() + facet_...(...) +labs(...) + theme_minimal()

• Data: what is being visualized• Aesthetic Mappings: mappings between variables in the data

and components of the chart• Geometric Objects: geometric objects that are used to display

the data, such as points, lines, or shapes• Statistical Transformations: applied to the data to summarize it• Facets: describe how the data is partitioned into subsets and

how these different subsets are plotted

Base histogram

gg <- ggplot(data = data) +aes(x=note_totale, fill=annee) +geom_histogram(bins = 15, alpha=0.6, col = "white") +labs(title = "Distribution des notes au QCM",

x = "Note", y = "Effectif") +theme_minimal()

facet_wrap()

gg + facet_wrap(~annee)

L1 L2 L3

25 50 75 100 25 50 75 100 25 50 75 100

0.0

2.5

5.0

7.5

10.0

12.5

Note

Effe

ctif

annee

L1

L2

L3

Distribution des notes au QCM

Legend position

gg + facet_wrap(~annee) +theme(legend.position="bottom")

L1 L2 L3

25 50 75 100 25 50 75 100 25 50 75 1000.0

2.5

5.0

7.5

10.0

12.5

Note

Effe

ctif

annee L1 L2 L3

Distribution des notes au QCM

Other use of facet_wrap()

gg + facet_wrap(~annee, ncol=2)

L3

L1 L2

25 50 75 100

25 50 75 1000.02.55.07.5

10.012.5

0.02.55.07.5

10.012.5

Note

Effe

ctif

annee

L1

L2

L3

Distribution des notes au QCM

Use of facet_grid()

gg + facet_grid(annee~sexe)

Un homme Une femmeL1

L2L3

25 50 75 100 25 50 75 100

02468

02468

02468

Note

Effe

ctif

annee

L1

L2

L3

Distribution des notes au QCM

Density chartggplot(data = data) + aes(x=note_totale) +

geom_density(fill="aquamarine3", color="white",alpha = 0.6) +

labs(title = "Distribution des notes au QCM",x = "Note", y = "") + theme_minimal()

0.000

0.005

0.010

0.015

0.020

0.025

25 50 75 100Note

Distribution des notes au QCM

Density chartsggplot(data = data) +

aes(x=note_totale, fill=annee, color=annee) +geom_density(alpha = 0.6) +labs(title = "Distribution des notes au QCM",

x = "Note", y = "") + theme_minimal()

0.00

0.01

0.02

0.03

0.04

0.05

25 50 75 100Note

annee

L1

L2

L3

Distribution des notes au QCM

With ridges lineslibrary(ggridges)ggplot(data = data) +

aes(x=note_totale, fill=annee, col=annee, y=annee) +geom_density_ridges(alpha = 0.6, scale = 3) +labs(title = "Distribution des notes au QCM",

x = "Note", y = "") + theme_minimal()

L1

L2

L3

50 100Note

annee

L1

L2

L3

Distribution des notes au QCM

Bar charts

ggplot(data) + aes(x=annee) +geom_bar(fill="aquamarine3") +theme_minimal()

0

20

40

60

L1 L2 L3annee

coun

t

Bar charts

ggplot(data) + aes(x=annee) +geom_bar(fill="aquamarine3", width = 0.5) +theme_minimal()

0

20

40

60

L1 L2 L3annee

coun

t

Bar charts

ggplot(data) + aes(x=annee, fill=bac) +geom_bar(width = 0.5) + theme_minimal()

0

20

40

60

L1 L2 L3annee

coun

t

bac

Bac ES

Bac S

Bac L

Bac STMG

Bac professionnel

Bar charts

ggplot(data) + aes(x=annee,fill=bac) +geom_bar(width = 0.5,position="fill") + theme_minimal()

0.00

0.25

0.50

0.75

1.00

L1 L2 L3annee

coun

t

bac

Bac ES

Bac S

Bac L

Bac STMG

Bac professionnel

Bar charts

ggplot(data) + aes(x=annee,fill=bac) +geom_bar(width = 0.5, position="dodge") + theme_minimal()

0

10

20

30

40

L1 L2 L3annee

coun

t

bac

Bac ES

Bac S

Bac L

Bac STMG

Bac professionnel

Position adjustement

Inside geom

• identity• stack• fill• dodge: side by side• jitter: useful for points (geom_jitter())• nudge: shift points

Draw multiple plots within one figure

density <- ggplot(data = data) +aes(x=note_totale, fill=annee, col=annee) +geom_density(alpha = 0.6) +labs(title = "Notes au QCM",

subtitle = "Les L2 sont très moyens.",x = "Note", y = "")+

theme_minimal()

barplot <- ggplot(data) + aes(x=annee, fill = bac) +geom_bar(width = 0.5) +labs(title = "Séries de baccalauréat par année de Licence",

subtitle = "Les filières ES et S sont très majoritaires.",x = "Note", y = "") +

theme_minimal()

Draw multiple plots within one figure

library(ggpubr)ggarrange(density,barplot,align="h")

0.00

0.01

0.02

0.03

0.04

0.05

25 50 75 100Note

annee

L1

L2

L3

Les L2 sont très moyens.

Notes au QCM

0

20

40

60

L1 L2 L3Note

bac

Bac ES

Bac S

Bac L

Bac STMG

Bac professionnel

Les filières ES et S sont très majoritaires.

Séries de baccalauréat par année de Licence

Recommended