Hands-on session: PythonResearch Data Visualisation Workshop
Leighton Pritchard1,2,3
1Information and Computational Sciences,2Centre for Human and Animal Pathogens in the Environment,3Dundee Effector Consortium,The James Hutton Institute, Invergowrie, Dundee, Scotland, DD2 5DA
Acceptable Use Policy
Recording of this talk, taking photos, discussing the content usingemail, Twitter, blogs, etc. is permitted (and encouraged),providing distraction to others is minimised.
These slides will be made available athttp://www.slideshare.net/leightonp
Table of Contents
1 IntroductionWhy listen to me?What is visualisation?Elementary perceptual tasks
2 Evidence-based representationWhat representations work best?Pie chartsBars and linesScatterplotsInteractive plots
3 Hands-on sessionPython librariesExercisesLet’s get started
What I do
Computational biologist (1996-present)
protein sequence-structure-function (1996-1999)yeast metabolism (1999-2003)plant pathology (2003-present)
Large datasets
sequence/genomicmetabolomicsstatisticsgeographical
Visualisation as communication to (wet) biologists
protein structuresmetabolic fluxcomparative genomics/evolutionstatistical plots
Big data. . .
GenomeDiagram a b c
aPritchard et al. (2006) Bioinformatics doi:10.1093/bioinformatics/btk021
bToth et al. (2006) Ann. Rev. Phytopath. doi:10.1146/annurev.phyto.44.070505.143444
chttp://biopython.org
Functional adaptation in Pba a b
aToth et al. (2006) Ann. Rev. Phytopath. doi:10.1146/annurev.phyto.44.070505.143444
bhttp://biopython.org
GenomeDiagram/SciArt a b c
aPritchard et al. (2006) Bioinformatics doi:10.1093/bioinformatics/btk021
bShemilt (2009) in“Digital Visual Culture: Theory and Practice” ISBN 978-1-84150-248-9
cShemilt (2010) in “Art Practice in a Digital Culture”, ISBN 978-0-7546-7623-2
Influence
Free open-source comparativegenomics visualisation library
Impact
Artwork (prints, audio-visualinstallation) exhibited in UK andinternationally
Comparative metabolism a b c
aBiopython KGML/KEGG visualisation module
bhttps://github.com/widdowquinn/Notebooks-Bioinformatics
chttp://biopython.org
PyANI: prokaryote classification a b
ahttp://widdowquinn.github.io/pyani/
bPritchard et al. (2016) Anal. Methods doi:10.1039/C5AY02550H
Pect
obact
eri
um
_atr
ose
pti
cum
_SC
RI1
04
3_u
id5
79
57
Pect
obact
eri
um
_atr
ose
pti
cum
_NC
PPB
85
49
Pect
obact
eri
um
_atr
ose
pti
cum
_NC
PPB
34
04
Pect
obact
eri
um
_atr
ose
pti
cum
_21
APect
obact
eri
um
_atr
ose
pti
cum
_JG
10
-08
Pect
obact
eri
um
_caro
tovoru
m_P
C1
_uid
59
29
5Pect
obact
eri
um
_caro
tovoru
m_s
ubsp
_caro
tovoru
m_N
CPPB
31
2Pect
obact
eri
um
_caro
tovoru
m_s
ubsp
_oderi
feru
m_N
CPPB
38
41
Pect
obact
eri
um
_caro
tovoru
m_s
ubsp
_oderi
feru
m_N
CPPB
38
39
Pect
obact
eri
um
_caro
tovoru
m_s
ubsp
_caro
tovoru
m_N
CPPB
33
95
Pect
obact
eri
um
_caro
tovoru
m_P
CC
21
_uid
17
43
35
Pect
obact
eri
um
_caro
tovoru
m_s
ubsp
_bra
silie
nsi
s_B
5Pect
obact
eri
um
_caro
tovoru
m_s
ubsp
_bra
silie
nsi
s_B
4Pect
obact
eri
um
_beta
vasc
ulo
rum
_NC
PPB
22
93
Pect
obact
eri
um
_beta
vasc
ulo
rum
_NC
PPB
27
95
Pect
obact
eri
um
_wasa
bia
e_N
CPPB
37
02
Pect
obact
eri
um
_wasa
bia
e_N
CPPB
37
01
Pect
obact
eri
um
_wasa
bia
e_W
PP1
63
_uid
41
29
7Pect
obact
eri
um
_SC
C3
19
3_u
id1
93
70
7D
icke
ya_s
ola
ni_
AM
YI0
1D
icke
ya_s
ola
ni_
AM
WE0
1D
icke
ya_s
ola
ni_
GB
BC
20
40
Dic
keya_s
ola
ni_
IPO
22
22
Dic
keya_s
ola
ni_
MK
16
Dic
keya_s
ola
ni_
MK
10
Dic
keya_d
ianth
icola
_NC
PPB
_35
34
Dic
keya_d
ianth
icola
_GB
BC
20
39
Dic
keya_d
ianth
icola
_NC
PPB
_45
3D
icke
ya_d
ianth
icola
_IPO
98
0D
icke
ya_s
pp_N
CPPB
_32
74
Dic
keya_s
pp_M
K7
Dic
keya_d
adanti
i_N
CPPB
_29
76
Dic
keya_d
adanti
i_N
CPPB
_89
8D
icke
ya_d
adanti
i_N
CPPB
_35
37
Dic
keya_d
adanti
i_3
93
7_u
id5
25
37
Panto
ea_a
nanati
s_A
J13
35
5_u
id1
62
07
3Panto
ea_a
nanati
s_LM
G_2
01
03
_uid
46
80
7Panto
ea_a
nanati
s_PA
13
_uid
16
21
81
Panto
ea_a
nanati
s_uid
86
86
1Erw
inia
_am
ylo
vora
_CFB
P1
43
0_u
id4
68
39
Erw
inia
_am
ylo
vora
_ATC
C_4
99
46
_uid
46
94
3Erw
inia
_Ejp
61
7_u
id1
59
95
5Erw
inia
_pyri
folia
e_E
p1
_96
_uid
40
65
9Erw
inia
_pyri
folia
e_D
SM
_12
16
3_u
id1
59
69
3D
icke
ya_d
adanti
i_Ech
70
3_u
id5
93
63
Dic
keya_p
ara
dis
iaca
_NC
PPB
_25
11
Dic
keya_a
quati
ca_D
W_0
44
0D
icke
ya_a
quati
ca_C
SL_
RW
24
0Erw
inia
_tasm
anie
nsi
s_Et1
_99
_uid
59
02
9Panto
ea_A
t_9
b_u
id5
58
45
Panto
ea_v
agans_
C9
_1_u
id4
98
71
Erw
inia
_bill
ingia
e_E
b6
61
_uid
50
54
7D
icke
ya_z
eae_A
PM
V0
1D
icke
ya_z
eae_A
JVN
01
Dic
keya_z
eae_C
SL_
RW
19
2D
icke
ya_z
eae_N
CPPB
_35
31
Dic
keya_d
adanti
i_Ech
58
6_u
id4
25
19
Dic
keya_z
eae_A
PW
M0
1D
icke
ya_z
eae_N
CPPB
_25
38
Dic
keya_z
eae_M
K1
9D
icke
ya_z
eae_N
CPPB
_35
32
Dic
keya_s
pp_N
CPPB
_56
9D
icke
ya_c
hry
santh
am
i_N
CPPB
_40
2D
icke
ya_c
hry
santh
am
i_N
CPPB
_51
6D
icke
ya_z
eae_E
ch1
59
1_u
id5
92
97
Dic
keya_c
hry
santh
am
i_N
CPPB
_35
33
Pectobacterium_atrosepticum_SCRI1043_uid57957Pectobacterium_atrosepticum_NCPPB8549Pectobacterium_atrosepticum_NCPPB3404Pectobacterium_atrosepticum_21APectobacterium_atrosepticum_JG10-08Pectobacterium_carotovorum_PC1_uid59295Pectobacterium_carotovorum_subsp_carotovorum_NCPPB312Pectobacterium_carotovorum_subsp_oderiferum_NCPPB3841Pectobacterium_carotovorum_subsp_oderiferum_NCPPB3839Pectobacterium_carotovorum_subsp_carotovorum_NCPPB3395Pectobacterium_carotovorum_PCC21_uid174335Pectobacterium_carotovorum_subsp_brasiliensis_B5Pectobacterium_carotovorum_subsp_brasiliensis_B4Pectobacterium_betavasculorum_NCPPB2293Pectobacterium_betavasculorum_NCPPB2795Pectobacterium_wasabiae_NCPPB3702Pectobacterium_wasabiae_NCPPB3701Pectobacterium_wasabiae_WPP163_uid41297Pectobacterium_SCC3193_uid193707Dickeya_solani_AMYI01Dickeya_solani_AMWE01Dickeya_solani_GBBC2040Dickeya_solani_IPO2222Dickeya_solani_MK16Dickeya_solani_MK10Dickeya_dianthicola_NCPPB_3534Dickeya_dianthicola_GBBC2039Dickeya_dianthicola_NCPPB_453Dickeya_dianthicola_IPO980Dickeya_spp_NCPPB_3274Dickeya_spp_MK7Dickeya_dadantii_NCPPB_2976Dickeya_dadantii_NCPPB_898Dickeya_dadantii_NCPPB_3537Dickeya_dadantii_3937_uid52537Pantoea_ananatis_AJ13355_uid162073Pantoea_ananatis_LMG_20103_uid46807Pantoea_ananatis_PA13_uid162181Pantoea_ananatis_uid86861Erwinia_amylovora_CFBP1430_uid46839Erwinia_amylovora_ATCC_49946_uid46943Erwinia_Ejp617_uid159955Erwinia_pyrifoliae_Ep1_96_uid40659Erwinia_pyrifoliae_DSM_12163_uid159693Dickeya_dadantii_Ech703_uid59363Dickeya_paradisiaca_NCPPB_2511Dickeya_aquatica_DW_0440Dickeya_aquatica_CSL_RW240Erwinia_tasmaniensis_Et1_99_uid59029Pantoea_At_9b_uid55845Pantoea_vagans_C9_1_uid49871Erwinia_billingiae_Eb661_uid50547Dickeya_zeae_APMV01Dickeya_zeae_AJVN01Dickeya_zeae_CSL_RW192Dickeya_zeae_NCPPB_3531Dickeya_dadantii_Ech586_uid42519Dickeya_zeae_APWM01Dickeya_zeae_NCPPB_2538Dickeya_zeae_MK19Dickeya_zeae_NCPPB_3532Dickeya_spp_NCPPB_569Dickeya_chrysanthami_NCPPB_402Dickeya_chrysanthami_NCPPB_516Dickeya_zeae_Ech1591_uid59297Dickeya_chrysanthami_NCPPB_3533
0.00
0.25
0.50
0.75
1.00
AN
Im_p
erc
enta
ge_i
denti
ty
Table of Contents
1 IntroductionWhy listen to me?What is visualisation?Elementary perceptual tasks
2 Evidence-based representationWhat representations work best?Pie chartsBars and linesScatterplotsInteractive plots
3 Hands-on sessionPython librariesExercisesLet’s get started
Data visualisation is art and science
. . .storytelling in pictorial or graphical format
Stories to yourself (sense-making)
summarise big stories quicklydata exploration and miningidentify areas/items of importancefind relationships and patterns
Stories to others (communication)
present your interpretation of datamake a specific pointassert a relationship or patterndemonstrate significance
Cautionary tales
avoid distortionmake the reader think about the data, not the presentationavoid chartjunk (excessive decoration)aim for high data:ink ratio
Data visualisation is art and science
. . .storytelling in pictorial or graphical format
Stories to yourself (sense-making)summarise big stories quicklydata exploration and miningidentify areas/items of importancefind relationships and patterns
Stories to others (communication)
present your interpretation of datamake a specific pointassert a relationship or patterndemonstrate significance
Cautionary tales
avoid distortionmake the reader think about the data, not the presentationavoid chartjunk (excessive decoration)aim for high data:ink ratio
Data visualisation is art and science
. . .storytelling in pictorial or graphical format
Stories to yourself (sense-making)summarise big stories quicklydata exploration and miningidentify areas/items of importancefind relationships and patterns
Stories to others (communication)present your interpretation of datamake a specific pointassert a relationship or patterndemonstrate significance
Cautionary tales
avoid distortionmake the reader think about the data, not the presentationavoid chartjunk (excessive decoration)aim for high data:ink ratio
Data visualisation is art and science
. . .storytelling in pictorial or graphical format
Stories to yourself (sense-making)summarise big stories quicklydata exploration and miningidentify areas/items of importancefind relationships and patterns
Stories to others (communication)present your interpretation of datamake a specific pointassert a relationship or patterndemonstrate significance
Cautionary talesavoid distortionmake the reader think about the data, not the presentationavoid chartjunk (excessive decoration)aim for high data:ink ratio
The point of data visualisation a
ahttps://en.wikipedia.org/wiki/Data visualization
Where does visualisation belong?
Communicating effectively
Understand the data
sizecardinalitymeaningrelationships
Know (or be receptive to) the message
what does pictorial representation mean?match graphical relationships to data relationships
Know your audience
how do people process pictorial informationhow does your audience process informationdomain-specific representations
Communicating effectively
Understand the data
sizecardinalitymeaningrelationships
Know (or be receptive to) the message
what does pictorial representation mean?match graphical relationships to data relationships
Know your audience
how do people process pictorial informationhow does your audience process informationdomain-specific representations
Communicating effectively
Understand the data
sizecardinalitymeaningrelationships
Know (or be receptive to) the message
what does pictorial representation mean?match graphical relationships to data relationships
Know your audience
how do people process pictorial informationhow does your audience process informationdomain-specific representations
Communicating effectively
Understand the data
sizecardinalitymeaningrelationships
Know (or be receptive to) the message
what does pictorial representation mean?match graphical relationships to data relationships
Know your audience
how do people process pictorial informationhow does your audience process informationdomain-specific representations
A model of communication a
aRandy Olson (2009) Don’t Be Such a Scientist
Table of Contents
1 IntroductionWhy listen to me?What is visualisation?Elementary perceptual tasks
2 Evidence-based representationWhat representations work best?Pie chartsBars and linesScatterplotsInteractive plots
3 Hands-on sessionPython librariesExercisesLet’s get started
Elementary Perceptual Tasks a
aCleveland & McGill (1984) J. Am. Stat. Ass.
The most basic visual tasks:
Implementations a
ahttp://www.datavizcatalogue.com/
Position: common scale
Scatterplot
Bar Chart
Angle
Pie Chart
Do(ugh)nut Chart
Curvature
Arc Diagram
Chord Diagram
Gestalt principles a
ahttps://emeeks.github.io/gestaltdataviz
proximity: close objects perceived asgroups
enclosure: bounded objects perceivedas groups
continuity: aligned objects perceivedas continuous
similarity: similar attributes perceivedas groups
closure: open objects perceived ascomplete
connection: connected itemsperceived as groups
Table of Contents
1 IntroductionWhy listen to me?What is visualisation?Elementary perceptual tasks
2 Evidence-based representationWhat representations work best?Pie chartsBars and linesScatterplotsInteractive plots
3 Hands-on sessionPython librariesExercisesLet’s get started
What works best? Experiment a b
aCleveland & McGill (1984) J. Am. Stat. Ass.
bHeer & Bostock (2010) CHI 2010
Empirical measurements of interpretation
Subjects shown graphs representing same data
(log2) error in subjects’ accuracy compared by graph type
Judgement types
1-3: Position on a common scale (bar chart, stacked barchart)
4-5: Length encoding (stacked bar chart)
6: Angle (pie chart)
7-9: Area (bubble chart, aligned rectangles, treemap)
What works best? Result a b
aCleveland & McGill (1984) J. Am. Stat. Ass.
bHeer & Bostock (2010) CHI 2010
We have inherent biasesthat can distortinformation recovered
Position > Angle ≈Length > Area
Accuracy plateaus ascharts increase in size
Gridlines improveaccuracy
Aspect ratios affect areajudgements (squaresworst)
Table of Contents
1 IntroductionWhy listen to me?What is visualisation?Elementary perceptual tasks
2 Evidence-based representationWhat representations work best?Pie chartsBars and linesScatterplotsInteractive plots
3 Hands-on sessionPython librariesExercisesLet’s get started
People hate pie charts
http://www.storytellingwithdata.com/blog/2011/07/death-to-pie-charts
especially Edward Tufte
A table is nearly always better than a dumb pie chart; the only worse design than a piechart is several of them[...] pie charts should never be used. - ”The Visual Display ofQuantitative Information”
”E pur si muove. . .” a b
aEells (1926) J Am. Stat. Ass.
bSimkin & Hastie (1987) J Am. Stat. Ass.
For proportions of a whole:
Pie charts read as accurately as bar charts
As number of components in the chart increases, bars are less efficient than piecharts
Table of Contents
1 IntroductionWhy listen to me?What is visualisation?Elementary perceptual tasks
2 Evidence-based representationWhat representations work best?Pie chartsBars and linesScatterplotsInteractive plots
3 Hands-on sessionPython librariesExercisesLet’s get started
Bar charts are bad. . .mmmkay?
There is an ongoing backlash against bar charts(and I’m not picking on Nick, he just tweets a lot. . .)
But are they really that bad?
Interpretation of bars and lines a
aZacks & Tversky (1999) Mem. Cognit.
People interpret bars and lines differently
Experiment 1: In absence of context (arbitrary X , Y )
bars: discrete comparison (24:0)
lines: trend assessment (0:35)
Interpretation of bars and lines a
aZacks & Tversky (1999) Mem. Cognit.
People interpret bars and lines differently
Experiment 2: With context (discrete or continuous data)
Bars vs. lines
People naturally interpret bar charts as categorical data
People naturally interpret line graphs as trends
Using bars for trend data or lines for categorical data canmislead the reader
Bar charts can mislead a
aWeissgerber et al. (2015) PLoS Biol. doi:10.1371/journal.pbio.1002128
Do these bars differ in value?
Bar charts represent data as asingle point: lossy compression.
Could different datasets give thesame bar chart?
Bar charts can mislead a
aWeissgerber et al. (2015) PLoS Biol. doi:10.1371/journal.pbio.1002128
Do these bars differ in value?
Bar charts represent data as asingle point: lossy compression.
Could different datasets give thesame bar chart?
Bars are lossy compression a
aWeissgerber et al. (2015) PLoS Biol. doi:10.1371/journal.pbio.1002128
Bars hide detail:
Number of data points
Variance of data points
Distribution of data points (outliers, etc.)
Bars may mislead on statistics a
aWeissgerber et al. (2015) PLoS Biol. doi:10.1371/journal.pbio.1002128
Bars may imply incorrect test statistics:
Overlaps, outliers, covariates, sample sizes masked
Bars for paired data a
aWeissgerber et al. (2015) PLoS Biol. doi:10.1371/journal.pbio.1002128
Bars imply independence of data:
Better than bar charts?
Bar chart with SE bars suggests group 2 is highest
Better than bar charts?
Bar chart with SD bars suggests there is overlap
Better than bar charts?
Univariate scatterplots show sample sizes, outliers, variance
Any chart can mislead
Any chart can mislead a
aSpurious Correlations, tylervigen.com
Table of Contents
1 IntroductionWhy listen to me?What is visualisation?Elementary perceptual tasks
2 Evidence-based representationWhat representations work best?Pie chartsBars and linesScatterplotsInteractive plots
3 Hands-on sessionPython librariesExercisesLet’s get started
Scatterplots a
ahttp://www.datavizcatalogue.com/
Scatterplots should be awesome:
Positions on common scale (lowest error representation)
Show all data: outliers, sample sizes, trends, etc.
Framing affects interpretation a
aCleveland et al. (1982) Science doi:10.1126/science.216.4550.1138
Point cloud size affects interpretation of correlation(more diffuse interpreted as lower correlation coefficient)
Interpreting correlation is difficult a
aFisher et al. (2014) PeerJ doi:10.7717/peerj.589
People don’t judge significance well
47.4% of significant relationships correctly classified
74.6% of non-significant relationships correctly classified
Table of Contents
1 IntroductionWhy listen to me?What is visualisation?Elementary perceptual tasks
2 Evidence-based representationWhat representations work best?Pie chartsBars and linesScatterplotsInteractive plots
3 Hands-on sessionPython librariesExercisesLet’s get started
Latency affects usage
Increasing latency to 0.5s:
decreases user activity
decreases datasetcoverage
reduces rate of hypothesisgeneration
changes data explorationstrategy
reduces future interactionwith other graphics
Table of Contents
1 IntroductionWhy listen to me?What is visualisation?Elementary perceptual tasks
2 Evidence-based representationWhat representations work best?Pie chartsBars and linesScatterplotsInteractive plots
3 Hands-on sessionPython librariesExercisesLet’s get started
Python libraries
Matplotlib http://matplotlib.org/
Seaborn https://stanford.edu/ mwaskom/software/seaborn/
ggplot for Python http://yhat.github.io/ggplot/
Bokeh http://bokeh.pydata.org/
Table of Contents
1 IntroductionWhy listen to me?What is visualisation?Elementary perceptual tasks
2 Evidence-based representationWhat representations work best?Pie chartsBars and linesScatterplotsInteractive plots
3 Hands-on sessionPython librariesExercisesLet’s get started
Exercise choices a
ahttps://github.com/widdowquinn/Teaching-Data-Visualisation
One-variable, continuous data
Grammar of Graphics
Interactive map with bokeh
Two-variable, continuous x, y data
Arrays, colormaps, surface plots
Making movies
Table of Contents
1 IntroductionWhy listen to me?What is visualisation?Elementary perceptual tasks
2 Evidence-based representationWhat representations work best?Pie chartsBars and linesScatterplotsInteractive plots
3 Hands-on sessionPython librariesExercisesLet’s get started
Licence: CC-BY-SA
By: Leighton Pritchard
This presentation is licensed under the Creative CommonsAttribution ShareAlike licensehttps://creativecommons.org/licenses/by-sa/4.0/