Upload
phungdat
View
217
Download
0
Embed Size (px)
Citation preview
InfographicscoursebookVersion1.2 (8February2014)FreeUniversityofBolzanoBozen–PaoloColetti
IntroductionThis book contains the infographics course’s lessons held at the Free University of Bolzano Bozen.
This book is in continuous development, please take a look at its version number, which marks important changes.
Disclaimers
This book is designed for novice data analyzers. It contains oversimplifications of reality and many technical details are purposely omitted.
TableofContentsINTRODUCTION ....................................................... 1
TABLE OF CONTENTS ................................................ 1
1. BASIC NOTIONS ............................................... 1
1.1. FUNDAMENTAL SCOPES .................................... 1 1.2. VARIABLES ..................................................... 2 1.3. SCALES .......................................................... 3
2. TRADITIONAL GRAPHS ..................................... 4
2.1. ONE NOMINAL VARIABLE .................................. 4 2.2. ONE SCALE VARIABLE ....................................... 5 2.3. NOMINAL VERSUS NOMINAL VARIABLES ............... 7 2.4. NOMINAL VERSUS SCALE VARIABLES .................... 8
2.5. SCALE VERSUS SCALE VARIABLES .......................... 8 3. SPECIFIC CHARTS ........................................... 13
3.1. DATA MAP ................................................... 13 3.2. TIME SERIES .................................................. 15
4. MISLEADING FACTORS ................................... 19
4.1. DISTRACTIBLE ELEMENTS ................................. 19 4.2. ELEMENTS IN THE WRONG PLACE ...................... 19 4.3. WRONG PROPORTIONS BETWEEN DATA AND SIZES 20 4.4. DIFFERENT UNITS OF MEASURE ......................... 21 4.5. NOT STARTING FROM ZERO .............................. 22 4.6. TWO AND THREE DIMENSIONAL GROWTH ........... 22
1. Basicnotions
1.1. Fundamentalscopes
Every time we need to represent anything, being a picturesque view through a painting, numbers through a table or, as it will be the case in this book, numbers through a picture, we must never forget what is the scope of our work. To help us in this task, the following questions can give good hints. They can be used every time we are faced with a representation to point out the bad aspects and to find out possible corrections and improvements.
Whatisrepresented?
Representing something without having a prior knowledge of what is going to be represented is clearly pointless, but there are numerous examples of representations built up only to impress the viewer, without any intention of really display something.
Paolo Coletti Infographics course book
Page 2 of 23 Version 1.2 (08/02/2014)
The topic which is displayed must be clear in the designer’s mind and also the point which wants to be underlined must be clear. All the image must be dedicated to these two tasks without adding worthless extras and without depriving these two points of any of their aspects for other secondary scopes.
Issomethingmisleading?
The most widespread problem in data visualization are misleading graphs, which deliberately or by ignorance display information in a wrong way to attract the viewer’s attention to the scope of the representation in an unfair way. The typical example of error are wrong proportions, displaying elements of the picture not in relative scale to each other. This may seems a childish trick, easy to discover; however, when it is done using multidimensional pictures it needs a careful eye to be spotted, a can be seen in section 0.
Issomethinguseless?
Too many graphical representations incorporate elements which do not have anything to do with the main scope, with the only task to embellish the graphics. Very often these elements distract the attention, when not moving the viewer’s focus completely away. In some cases they make parts of the representation incomprehensible, as can be the case when fancy fonts are used to write labels and numbers.
Issomethingmissing?
Very often a graphical representation needs also text and numbers. Even though most of it can be stored in the caption, it is necessary that axes contain the units of measure and their scale, that colors contain a proper legend and that, in case useful information is omitted, it be explicitly cited and explained.
A scaleless or unitless graph is useless even when all elements to compare are in scale, any difference which seems huge can instead be irrelevant. A representation with missing relevant details leads immediately the suspect that the designer wanted to hide something.
Last but not least a time indication is fundamental. Data will change in time while the representation stays the same (unless it is fully interactive) and can be looked at even many years after. With a missing time indication the representation is worthless, it can be referred to right now as well as to 3 millennia ago.
Canotherinformationbeadded?
This question has two aspects. The positive one is checking whether the graph can support other useful details to be stored inside without driving attention away from the main points and without decreasing its comprehensibility. The negative one is making sure that no other useless information is stored inside the graph, better keeping it simple than overloading it with useless elements.
1.2. Variables
We follow Stevens theory1 and we distinguish the data that we have in three big categories, which are treated in different ways both by statistics as well by graphical representations.
Scale variables, also called numeric variables, are data expressed in a numerical format, be it continuous or discrete, on which mathematical operations make sense. They are usually physical measures, such as time, age, weight, distance in Kilometers, length, number of children, GDP.
Nominal variables, also called categorical variables, are data which represent categories. No mathematical operation makes sense on them, even though sometimes numbers are used to identify the categories. Their only scope is dividing the population into categories. For example, country of residence, sex, degree course. Even though scale variable could divide the population into categories, the fact that they have many possible values, even infinite values when the variable is continuous, using them to divide the
1 Stevens, S. S. (1946); On the Theory of Scales of Measurement; Science, vol. 103, n. 2684, pages 677–680.
Infographic
Version 1.2
population each catego
The border ordinal variposition in can be donelements Liindicated wanswering 2
Ordinal varthey dividenumbers usscale variab
1.3. Scal
Unless extrepresentat
The linear slength, withthis scale, ireally so bigexample, benecessary to
The alternaexponentiabut on the value can bused, as it isto indicate growing exp
s course boo
(08/02/2014
into categoory.
between noiables, whicha running cone (we cannikert scale u
with 1, to ver2, but we can
iables can ae the populaser to represbles.
les
ravagant sotions of num
scale is the h this proporn such a wag, as in sectiecause the po render ver
ative scale lly proportioaxis still the
be zero nor ns more humaon the axis ponentially.
ok
4)
ries would r
ominal and h define onlyompetition isot say that used in quesry positive, innnot say that
lways be usation into casent the orde
olutions aremerical data.
standard chrtion fixed foy that the vion 4.5. Wheproblem itsery evident th
is the logaonal to its lene original vanegative, as an compreheat least thrIn logarithm
result in a h
scale variaby an order os clearly ordithe sixth onstionnaires, andicated witt it is twice a
sed as nominategories. Ser can some
absolutely
hoice, every or the entire iewer is noten the zero llf concentrae fact that th
arithmic scangth. In the lue is indicaits logarithmensible, but ee values, toic scale the u
uge number
bles is howevon the thing inal, as it dene is 3 timea set of preh 5. It is cleaas positive.
nal variablesometimes, wehow also ind
necessary,
interval repgraph. It is ot mislead intlevel is so faates on the dhe scale doe
le, where elogarithmic ated, not its m does not eit is equivaleo give the vusual startin
r of categori
ver not so sthey are refines the ords the secondetermined ar that answ
s, as they hawhen they duce a math
there are
presents a doften a good o thinking thar away and differences as not start fr
each intervascale the loglogarithm. G
exist. Traditioent to use aniewer the cog value is 1,
es, often wi
traight. And presenting. Fder but no md one). A tyanswers goering 4 is alw
ave a limitedhave severahematical me
two scales
istance whicidea to incluhat apparenit is not usend not on throm 0, using
al representgarithm of tGiven the feonally logarity other baseorrect impreas 0 is not re
P
ith one or z
in the midwFor examplemathematicaypical exampoing from veways more p
d number ofal values aneaning, they
s which are
ch is proporude also the tly huge diffeful for the phe absolute graphical to
ts a distanche values areatures of lothmic scale ie such as 2 oession that nepresentable
Paolo Coletti
Page 3 of 23
ero cases in
way we finde, the arrivalal calculationple is the 5‐ery negative,positive than
f values andd when the are used as
e typical in
rtional to itszero level inferences areproblem (forvalues), it is
ools or text.
ce which isre displayed,ogarithm, noin base 10 isr . It is vitalnumbers aree.
i
3
n
d l n ‐, n
d e s
n
s n e r s
s , o s l e
Paolo Colet
Page 4 of 23
2. Tra
How one or
2.1. One
To representechnically
The first chaxis the var
Millions of pbetween 1998
The secondnumber of informationcases.
The third chand when tNE, E, SE, etwhich are n
Number of sh
Possible var
• the bar p
• the line be confu
Sat
Sun
ti
3
ditiona
r several vari
enomina
nt one nominidentical.
oice is the ciable’s categ
patent applicat8 and 2007, sou
d choice is tcases in th
n on the abs
hoice is the they have a ctc. Since it isnot circular a
hops open per wdata) .
riations of th
plot, a versio
plot, which used with a
0
100
200
300
400
500Mon
Fri
algrap
iables can be
alvariabl
nal variables
olumn plot, gories and on
tions in Japanurce WIPO.
the pie charhat categorysolute numb
radar graphcircular meas considered t all, especia
week day (inve
he previous t
on of the colu
consists in amathematic
Tue
Wed
Thu
hs
e graphically
le
there are tw
sometimes in the vertica
, US, Europe
t. It depictsy. Comparingber of cases
, which is a ning, such ato be a very
ally in market
nted L
two basic typ
umn plot wit
column ploal graph, as
d
visualized d
wo basic dist
improperly cl axis the fre
Number of pin 2011, sou
s a circle divg it to the is lost, whil
good choiceas months, wy fancy graphting dimensi
Line plot of percmonth with
pe are:
th horizonta
t where a linon the horiz
0,0%
0,2%
0,4%
0,6%
0,8%
1,0%
1,2%
1,4%
A
epends stron
tinct choices
called histogquency of ca
participants by rce http://lsuag
vided into slcolumn ploe strong em
e when the nweek’s days, h, it is often aons.
centage of peoph respect to tot
l bars;
ne connects zontal axis t
Apr May
In
V
ngly on the v
and a bunch
ram. It repreases in each
race that attengcenterode.wo
lices, each ot we observ
mphasis is pu
number of chours, anguabused and
ple who use antal Apps’ users,
the top of ehere is not a
Jun Ju
nfographics
Version 1.2 (0
variable type
h of variation
esents on thcategory.
nd the workshoordpress.com.
one proportve that hereut on the pe
categories is lar directionapplied also
n App more tha source Flurry A
each columna numerical
l Aug
course book
08/02/2014)
e.
ns which are
he horizontal
p on Social Me
ional to thee the visualercentage of
at least fivens such as N, to variables
n 100 times a Analytics
. It must notvariable but
Sep
k
)
e
l
dia
e l f
e , s
t t
Infographics course book Paolo Coletti
Version 1.2 (08/02/2014) Page 5 of 23
simply categories. Special care must therefore be taken when using it and It is best suited when an ordinal variable appears on the horizontal axis, especially if it is a time related variable;
• the area plot, which is identical to the line plot but has the area below the line colored. This is many times a misleading element, as it gives the impression that values between two categories really exist (they do not, on the horizontal axis we have only categories not a continuous variable);
• all the three dimensional versions of these graphs are also commonly used. However, unless there is a strong necessity for an extra dimension, this is only extra graphics which moves the focus away from the main point and confuses the viewer.
2.2. Onescalevariable
Two graphs are used to represent the distribution of one scale variable, both requiring grouping procedures.
The most famous is the histogram. It is built deciding a priori a subdivision into intervals of the possible variable’s values and then counting how many cases falls into each interval. The result obviously depends on the choice of intervals, which should be chosen wisely, as too few intervals would result in a useless histogram, while too many intervals would produce an up and down histogram with many 0 cases or 1 case intervals.
Histogram with 5 intervals for 100 cases (invented data) Histogram with 50 intervals for 100 cases (same invented data)
$1,000 $2,000 $3,000 $4,000 $5,0000
10
20
30
$1,000 $2,000 $3,000 $4,000 $5,000
2
4
6
8
Paolo Colet
Page 6 of 23
Histogram wit
Intervals mthan the otintervals, sowrong impr
Less famoucentral quaevident whusually indicindicated w
B
One scale vWhenever wwe can usevariable’s vathe visual pnumber of c
$1,0000
4
8
12
ti
3
th 15 intervals fdat
ust have thethers it contaomething whression of an
s, but in somartiles througere the centcate where t
with symbols.
Boxplot of daily
variable has we are intere the columnalues insteadpower’s poincases is smal
$2,000 $
for 100 data (sata)
e same lengtains the casehich is clearl bigger impo
me cases mugh a rectangtral 50% of tthe central 9.
y log returns of
also anotherested in dispn plot (with ad of frequennt of view, ll. Very often
$3,000 $4,000
ame invented
th. For exames which sholy misleadingortance for th
ch more effigle cut by a the values’ d90% of the di
3M company q
r family of replaying the vall its variaticy of cases. Tto writing an in this case
$5,000
HistogramThere are
mple, if we buould be in 3 g. Moreoverhat part.
icient, is theblack line rdistribution iistribution lie
quoted at NASD
epresentatiovalues case bions) or the This kind of a table with e the pie char
m with the first i8 cases in the f
are
uild an histointervals anr, it is also 3
e boxplot. It iepresenting is. Then it dees. Other va
DAQ in 2012, so
ons which deby case and nradar plot wrepresentatithe numerirt is totally in
In
V
interval 3 timesirst box, 7 in thea looks much s
gram with ad thus it is htimes wider
is a visual rethe medianepicts two Tlues, called o
urce www.port
epicts a comnot the distrwith cases inion is perfectc values andnappropriate
nfographics
Version 1.2 (0
s wider (same ihe second one esmaller.
an interval 3 high as the sr, which give
epresentationn. It is thus iT‐shaped whoutliers or ex
tfolioprobe.com
mpletely differibution of alnstead of cattly equivalend works onle.
course book
08/02/2014)
nvented data). even though its
times widersum of threees the visual
n of the twoimmediatelyiskers whichxtremes, are
m.
erent aspect.l the values,tegories andnt, also fromly when the
k
)
s
r e l
o y h e
. , d m e
Infographic
Version 1.2
Country
United State
China
Japan
Germany
France
Brazil
United Kingd
Italy
Month
Avprec
Jan
Feb
Mar
Apr
May
Jun
2.3. Nom
The represeone with thside by sidealso one abthe sum for
In this casethird dimen
Average se
0
5
10
15
20
25
Enterntainm
s course boo
(08/02/2014
GDP p
es
dom
verage monthlycipitation (cm)
Bozen
1,33
1,06
1,54
3,37
4,2
6,04
minalver
entations of he more catee, line by linebove the other each catego
e it makes sension. Howev
ession frequencoperating syste
ment Games
iP
ok
4)
per capita (US
14.991.300.0
7.203.784.0
5.870.357.0
3.604.061.0
2.775.518.0
2.476.651.0
2.429.184.0
2.195.937.0
y in
Month
Ap
Jul
Ago
Sep
Oct
Nov
Dec
rsusnom
two nominaegories is dise using diffeer, when it mory of the fir
ense to use ver, still the i
cy per month ofem, source Flur
Lifestyle
Phone Android
D) in 2011
000.000.000
000.000.000
000.000.000
000.000.000
000.000.000
000.000.000
000.000.000
000.000.000
Average monthprecipitation (cm
in Bozen
6,27
4,9
3,79
4,6
3,51
1,53
minalvari
al variables aplayed as berent colors (makes sensest variable.
a 3D graph,information
f smartphone urry Analytics
News Social
d
hly m)
iables
are variants efore, while (in case line e to sum the
, where the on the exact
usage by
0,0E+00
2,0E+15
4,0E+15
6,0E+15
8,0E+15
1,0E+16
1,2E+16
1,4E+16
1,6E+16
UniteState
Sep
Oct
Nov
l networking
of the one nthe categorior radar gracategories a
second nomt proportion
Early‐stage esource “GEM G
edes
China Japa
GDP pe
0
1
2
3
4
5
6
7
Jan
Jul
Ago
Dec
nominal variaes of the othaphs turns oand thus give
minal variabls is lost.
entrepreneuriaGlobal adult po
an Germany Fran
er capita (USD)
Feb
Mar
Apr
May
Jun
P
able graphs.her one are ut to be appe also an info
e is represe
l activity in the opulation surve
nce Brazil UnKing
in 2011
Average montprecipitatio(cm) in Boze
Paolo Coletti
Page 7 of 23
. Usually therepresentedpropriate) orormation on
nted on the
UK by age grouy (APS) 2002‐20
itedgdom
Italy
thlynen
i
3
e d r n
e
up, 011”
Paolo Colet
Page 8 of 23
European, USPottelsbe
2.4. Nom
Also the nonominal varis necessaryvisibility pu
Boxplots of dcom
2.5. Scal
The traditiowhere eachwhenever tlines must nmodel), whobserved va
ti
3
S and Japanese erghe (2009) “B
prote
minalver
ominal versuriable’s categy: when the rposes can b
daily log returnmpanies, source
leversus
onal way to h case is reprthis order mnot be confuhich is a staalues.
patent costs, soeyond the prohection in Europe
rsusscale
us scale reprgories depictnumber of cbe misleading
ns at NASDAQ inwww.portfolio
sscaleva
represent twresented witakes sense, used with thatistical tool
ource Mejer, Mhibitive cost of e”
evariabl
resentations ted side by scases in eachg, while not
n 2012 for diffeoprobe.com
ariables
wo scale varth a symbol.a mathemate regressioncommonly
M., and Van patent
les
are variantsside or one oh category is doing it can
erent Histo
riables is theWhenever ttical graph cn line (or regused in co
Average sessioope
s of the scaover the othstrongly diffsqueeze a d
ograms, with vestudents in 2
e scatterplothe cases arecan be built gression curvmbination w
0
5
10
15
20
In
V
on frequency peerating system,
le variable rer. For histoferent, rescaistribution to
ertical axes resc2009, source ht
t, a graph we in horizontjoining the ve in case ofwith a scatte
nfographics
Version 1.2 (0
er month of sm source Flurry A
representatioograms a spealing the verto the horizon
caled, of heightttp://www.wus
with two numal axis valuesymbols witf a non‐lineaerplot to fo
course book
08/02/2014)
martphone usagAnalytics
on, with theecial warningtical axes forntal axis.
s by sex of CSE stl.edu.
merical axeses’ order andth lines. Thisar regressionorecast non‐
iPhone
Androi
k
)
ge by
e g r
s d s n ‐
id
Infographic
Version 1.2
Scatterplot o
Scatterpexperiment, s
When the nthe effect oincompreheis a function
Two‐input on
s course boo
(08/02/2014
of number of me
plot with lines osource chapter
number of scof two variaensible in thrn of the ones
ne‐output fuzzyToolbox exa
ok
4)
edicines used dsource “Relaz
of photosynthes 3 of book ISBN
cale variableables on a three dimensios on the hori
y controller dataamples for Mat
daily per 1000 inzione sanitaria –
sis in a biologicaN 978‐953‐307‐2
es grows to third one. It ons) with linezontal axes a
a, source Fully lab.
nhabitants and– Provincia aut
al 283‐8.
three, surfacis a tridimees and thus wand if the pr
Logic CircChinpe
average cost oonoma di Bolza
Scatterhttp://markw
e plot is bornsional versworks only wroper rotated
cle’s size represna, big light blueeople are from country is Luxe
of daily medicinano – 2005”
plot with regresadsworth.blogs
rrowed fromsion of scattwhen the vard view is ado
sent country’s pe one is India, bJapan, poorestmbourg), sourc
P
ne dose. Cases a
ssion line , souspot.it, data fro
m mathematiterplot (whicriable on theopted.
population (bigbig yellow one it country is Conce www.gapmi
Paolo Coletti
Page 9 of 23
are regions,
rce om OECD.
cs to displaych would bevertical axis
g red circle is is USA, oldest ngo, richest nder.org.
i
3
y e s
Paolo Coletti Infographics course book
Page 10 of 23 Version 1.2 (08/02/2014)
2.5.1. Bubblecharts
When the two variables on the horizontal axes do not share the same unit of measure, as for example population and GDP, it is more effective to use another representation, known as bubble chart. In this representation one of the scale variables is displayed increasing the size of bubbles, either the diameter or the area (even though a two dimensional distortion comes into play, see section 4.6). Moreover, colors can be used to represent a fourth nominal or scale variable.
Website www.gapminder.org has a tool to display several bubble charts which move with time, thus introducing also another dimension. However, it works only on their data.
In order to produce bubble charts on personal data Google Developers’ bubble chart can be used. It does not have an easy to use time feature, but its charts can even be inserted on websites. All its specifications are on https://developers.google.com/chart/interactive/docs/gallery/bubblechart (September 2013) and are summarized here below.
ProducingbubblechartswithGoogleDevelopers
First of all, in order to produce a bubble chart and being able to test, an appropriate HTML file should be created following these steps:
1. create a plain text file, open it with Notepad and copy inside it this code; <html> <head> <script type="text/javascript" src="https://www.google.com/jsapi"></script> <script type="text/javascript"> google.load("visualization", "1", {packages:["corechart"]}); google.setOnLoadCallback(drawChart); function drawChart() { var data = google.visualization.arrayToDataTable([ ['ID', 'Life Expectancy', 'Fertility Rate', 'Region', 'Population'], ['CAN', 80.66, 1.67, 'North America', 33739900], ['DEU', 79.84, 1.36, 'Europe', 81902307], ['DNK', 78.6, 1.84, 'Europe', 5523095], ['EGY', 72.73, 2.78, 'Middle East', 79716203], ['GBR', 80.05, 2, 'Europe', 61801570], ['IRN', 72.49, 1.7, 'Middle East', 73137148], ['IRQ', 68.09, 4.77, 'Middle East', 31090763], ['ISR', 81.55, 2.96, 'Middle East', 7485600], ['RUS', 68.6, 1.54, 'Europe', 141850000], ['USA', 78.09, 2.05, 'North America', 307007000] ]); var options = { }; var chart = new google.visualization.BubbleChart(document.getElementById('chart_div')); chart.draw(data, options); } </script> </head> <body> <div id="chart_div" style="width: 900px; height: 500px;"></div> </body> </html>
2. close Notepad and change the extension of the text file from .txt to .html; 3. double click on the file and open it in your browser, pressing “Allow blocked content”; 4. go back to the file’s directory, right click the mouse on the file and choose “Open with”, then
“Choose default program…”; 5. uncheck “Always use the selected program to open this kind of file”; 6. from the list of other programs select Notepad and press “OK”.
Infographic
Version 1.2
Now you sopen with Nthe same tyou reloadmodificatio
In order toedit the “vorganized irepresent oby square comma (exeach row, commas anorder:
1. nam2. hor3. vert4. colo
sca5. size
The previou var [ [ [ [ [ [ [ [ ]);
The differenon page 11.
Several optas in the tewhen omittothers are s
axis
bac'#00
bub- o- s- t
sf
For
heig
wid
chashoexa
colovari
s course boo
(08/02/2014
should haveNotepad to ime, in your it to checns work.
o modify thevar data” sein a table wone bubble brackets anxcept the lacolumns arnd must be
me of the burizontal coordtical coordinor of the buble variable, ce of the bubb
usly data user data = goo['ID', 'X', ['', 80, ['', 79, ['', 78, ['', 72, ['', 81, ['', 72, ['', 68, ;
nce between.
ions can be aext below onted, thus it isspecified. Th
sTitlesPositio
ckgroundColo0cc00'. For e
bble: followeopacity: a nustroke: bubbtextStyle: folsuch as textSfalse} example bub
ght: the heig
dth: the widt
artArea: folloould be resemple chartA
ors: a vector iable is scale
ok
4)
e the HTMLmodify it anr browser wck whether
e data you ection. Datawhere each and is encld followed ast one). Inre separated, exactly in
bble, dinate, mustnate, must bebble, optionacolor are dispble, optional,
e a nominal vogle.visuali'Y', 'Tempe167, 1136, 1184, 5278, 2200, 2170, 1477, 8
n these two
applied to thn page 12. As a good idee most usefu
on: 'out', 'in
or: name ofexample back
ed by curled umber betweble’s border’sllowed againStyle: {colo
bble: {opaci
ght in pixel o
h in pixel of
owed by curleerved for thArea:{left:20
of colors, eie, only startin
L file nd, at where your
must a are row osed by a nside d by this
t be scale, e scale, al. If it is a noplayed via gr, must be sca
variable as fozation.arrayrature'], 20], 30], 0], 30], 10], 00], 0]
graphs lies o
he graph, wrill of them ara to specify ul are:
n' or 'none'.
f the color kgroundColor
brackets andeen 0 and 1s color n by curled bor: 'black',
ity: 0.4, st
f the entire d
the entire dr
ed brackets e chart. Th0,top:0,widt
ther names ng and endin
ominal variaradient; ale.
ourth variablyToDataTable
only in the c
iting them inre completeonly the nee
. For example
between sinr: 'yellow'
d the followi
rackets with, fontName:
troke: 'red'
drawing area
rawing area
with left maese values th:"50%",hei
or codes, in ng colors are
ble, each cat
e. An exampe([
coloring of b
n the var optly optional aeded one, ev
e axisTitles
ngle quotati
ng sub param
h the specific‘Arial’, fo
, textStyle:
a. For examp
rgin, top macan be indiight:"75%"}
which the buneeded and
tegory uses a
ple with a sca
ubbles, see
tions sectionand have a rven though i
sPosition: '
on, as 'whit
meters
ations of thentSize: 9, b
: {fontSize:
ple height: 6
argin, how mcated in pix
ubbles are dd a gradient b
P
P
a different co
ale fourth va
colors param
n, separated reasonable din these exa
out'
te', or colo
e text inside bold: true,
: 8, color:
600
much of the dxel or perce
epicted. If fobetween the
Paolo Coletti
age 11 of 23
olor. If it is a
riable is
meter below
by commas,default valuemples many
r’s code, as
the bubble, italic:
'brown'} }
drawing areaentages. For
ourth ese two
i
3
a
w
, e y
s
a r
Paolo Coletti Infographics course book
Page 12 of 23 Version 1.2 (08/02/2014)
colors will be used. If fourth variable is nominal, a list of colors for all categories is needed. For example colors: ['red','#004411','black','orange']
hAxis: specifies the horizontal axis parameters, is followed by curled brackets and the following sub parameters - direction: 1 or ‐1 for reverse direction - format: a format string for numeric axis labels. For example format:'#,###%' to display
percentages and 3 decimal digits - gridlines: to specify color and count of the gridlines (use same color as background to have no
gridlines), enclosed by curled brackets. For example gridlines: {color: 'white', count: 4} - logScale: true or false - textPosition: 'out', 'in' or 'none' - textStyle: text style for axis’ font. See bubble.textStyle on page 11 for example. - ticks: a vector of numbers indicating where ticks should go. For example ticks: [5,10,15,20] - title: string with the axis’ title - titleTextStyle: style of the title, for example titletextStyle: { fontSize: 9, color: ‘red’} - maxValue and minValue: maximum and minimum value for the axis For example hAxis: {title: 'Life', titleTextStyle: {color: 'black', bold: true}, textStyle: {fontName: ‘Arial’}, minValue: 55, maxValue: 90, ticks: [60, 65, 70, 85], maxValue: 90}
vAxis: specifies the vertical axis parameters, same as hAxis
legend: specifies the legend parameters, is followed by curled brackets and the following sub parameters - position: 'right', 'top', 'bottom', 'in' or 'none' - alignment: 'start', 'center' or 'end' - textStyle: legend’s text style. See bubble.textStyle on page 11 for example. For example legend: {position: 'top', textStyle: {color: 'blue', fontSize: 16}}
sortBubblesBySize: if true it sorts the bubbles by size so that the smaller bubbles appear above the larger bubbles. If false, bubbles are sorted according to their order in the data
title: text to display above the chart. For example title: 'Chart of revenues by cost'
titlePosition: 'in', 'out' or 'none'
titleTextStyle: title text style. See bubble.textStyle on page 11 for example.
tooltip: specifies the tooltip parameters, is followed by curled brackets and the following sub parameters - textStyle: tooltip’s text style. See bubble.textStyle on page 11 for example. - trigger: 'focus' to display the tooltip when the user hovers over an element, 'none' to avoid
displaying the tooltip.
For example, some parameters can be var options = { axisTitlesPosition: 'out', backgroundColor: 'white', bubble: {opacity: 0.4, stroke: 'blue', textStyle: {fontSize: 11, color: 'brown'} }, chartArea:{left:100,top:50,width:"70%",height:"60%"}, colors:['red','#004411','orange'], hAxis: {title: 'Life', titleTextStyle: {color: 'black'}, minValue: 55, maxValue: 90, ticks: [60, 65, 70, 75, 80, 85]}, vAxis: {title: 'Fertility', logScale: true, maxValue: 6, ticks: [1,2,3,4,5], gridlines: {color: 'white'} }, height: 800, width: 1200, legend: {position: 'bottom', textStyle: {color: 'blue', fontSize: 16} }, title: 'Correlation between life expectancy, fertility rate and population of some world countries (2010)', tooltip: {trigger: 'focus'} };
Infographic
Version 1.2
3. Spe
Several fieldused to disexpert eyes
3.1. Dat
When one ovariables usvariables, wand depict while if thegeographic
Source A
Other elemelement’s sfrom variouthe map itsrelations isincomprehe
s course boo
(08/02/2014
cificch
ds make usesplay particus then standa
tamap
of the involvsing its geogwe can take pthe chart one other varialocations bu
Annals of Intern
ments can besize as well aus points of tself, we muss 2 to the ensible.
ok
4)
harts
e of particulular graphs ward ones.
ved variable graphic coorprofit of the n the map itble does nout drawn abo
nal Medicine, vo
e added to das a directiothe map. Wht unavoidabpower of
ar variables which turns
is a geograprdinates. Hoexistence ofself. If the ot exist we caove a map.
ol. 133, n. 2, pa
data maps, on using the henever disply cut those the numbe
for which dout to be m
phic locationowever, instef a well‐knowother variablan build pra
ages 161‐162.
using the famap’s orien
playing relatielements w
er of eleme
data have wemore approp
it can theoread of usingwn map, imme is a scale ctically a sca
Black18
act that we ntation. A tyons among e
which are tooents and ca
ell‐known prpriate and m
retically be cg methods fmediately recvariable, weatterplot usi
k lines are chole54 as drawn by
9781
can display pical exampelements of o small, as on easily ma
P
P
roprieties, wmore unders
converted infor three orcognizable be can build ang symbols
era cases 19 Auy John Snow, so1130088281.
a proportiople is the flothe map is d
otherwise theake the en
Paolo Coletti
age 13 of 23
which can bestandable by
to two scale more scaleby the views, color scale,to mark the
ug to 30 Sept ource ISBN
on using thews of trafficdisplayed one number ofntire picture
i
3
e y
e e , , e
e c n f e
Paolo Coletti Infographics course book
Page 14 of 23 Version 1.2 (08/02/2014)
Exports by sea of French wines in 1864, source C.J. Minard 1865. 1999 volume of minutes of voice telecommunication. Only the principal route pairs are shown to avoid
cluttering the map. Circular symbols on capital cities represent the country's total annual outgoing traffic,
source http://mappa.mundi.net and www.telegeography.com
3.1.1. Statsilksoftware
A very efficient free software which gives the opportunity to load a map below and data from a spreadsheet, displaying them over the map in a variety of ways including side graphs is developed by Statsilk www.statsilk.com and comes in three packages:
StatPlanet, a web‐based and desktop software for visualizing spatial data through interactive maps, graphs and charts
StatTrends, a web‐based and desktop software for visualizing any kind of data through interactive graphs and charts
StatWorld/StatPlanet, a web‐based and desktop software for exploring world statistics from various international organizations through interactive visualizations.
These programs are interactive, meaning that graphs’ proprieties can be changed with a simple click and web‐based, meaning that they can work through a browser connected to Statsilk’s website.
Infographics course book Paolo Coletti
Version 1.2 (08/02/2014) Page 15 of 23
3.2. Timeseries
3.2.1. Financialchart
Financial data use scale variables time, low price, high price, open price and close price and the main interest is usually depicting prices as a function of time. They have the property that low is always smaller than all the others and high is always larger than all the others. Therefore they can be represented using a candlesticks chart, where time is on the horizontal axis, the central rectangle has values open and close as extremes (and is white or green when open is the base, black or red then close is the base) and the vertical line goes from low to high. In this way five different scale variables can be efficiently represented together. Sometimes traded volume or other indicators is depicted as a daily column plot below the chart, using the same horizontal time line.
Special care must be taken when producing or observing a financial graph for missing values, either because the security is not traded (market closed or security suspended) or because of data errors; in cases like this it is preferable to leave a blank space if just one of the four prices is missing.
Paolo Colet
Page 16 of 2
A similar represence of
3.2.2. Gan
A very altermoves on wconnecting part procee
When dealiHenry Gant
ti
23
Telecom ItaliSundays w
presentationf other indica
nttchart
rnative way with its planthe starting
eding.
ng with projt.
ia daily open, cwithout leaving
n is typical foators such as
to display aned activitieg and ending
ject’s planni
lose, low, high any blank space
or weather, as precipitatio
starting andes) is using og points with
ng, this grap
prices, source Ye, other marke
as is shares ton and humi
d ending timone dimensioh a line. The
ph uses bars
Yahoo!. Horizot closed days h
the high anddity.
me of an eveon as time asteepness o
and takes t
In
V
ntal axis skips Save a blank spa
d low of finan
ent which moand the otheof the line in
he name of
nfographics
Version 1.2 (0
Saturdays and ace instead.
ncial data, a
oves ahead er one as mondicates how
Gantt chart,
course book
08/02/2014)
s well as the
(in space oroving space,w fast is that
, in honor of
k
)
e
r , t
f
Infographic
Version 1.2
Timetable of t
3.2.3. Na
Figurative mfundamentasame chart the army’s battles andexception oin geograph
Other elemdedicated tinserted thr
s course boo
(08/02/2014
trains from Par
rrativegra
map of Frenal example oseveral scalsize and th the crossedof the timelinhical and not
ments can beto the timelirough other
ok
4)
Ga
ris to Lyon oper
aph
nch army inof a narrativle variables: e weather td rivers. Evene which is ot temporal sc
e representeine then themeans.
ntt chart for stu
rating in 1880, r
n Russian cave graph. Ththe coordintemperatureerything whionly sketchecale).
ed on a figue geographic
udy activity for
red line is TGV
ampaign 181his graph manates of the e. Moreoverch is relevad for the ret
urative map,c information
r ADA course 20
train in 1981, s
12‐1813 draanages to dearmy’s posit, it includesnt for the ctreat and is n
, bearing in n must be s
013.
ources E.J. Mar
awn by C.J. epict in a vetion includinuseful elemampaign is not in scale
mind that queezed in t
P
P
rey 1885 and E
Minard in ery efficient ng all possibments such reported, w(as the horiz
if the horizothe vertical
Paolo Coletti
age 17 of 23
.R. Tufte 2001.
1869 is theway on thele branches,as the main
with the onlyzontal axis is
ontal axis isaxis only or
i
3
e e , n y s
s r
Paolo Coletti Infographics course book
Page 18 of 23 Version 1.2 (08/02/2014)
Figurative map of the men losses of the French army in the Russian campaign 1812‐1813, source C.J. Minard 1869.
A new chart of history, source Lectures on History and General Policy by J. Priestley 1788.
Infographics course book Paolo Coletti
Version 1.2 (08/02/2014) Page 19 of 23
4. Misleadingfactors
4.1. Distractibleelements
Any element which is not directly related to the focus of the representation should be avoided, especially when it is only an embellishment of the chart. The typical example is inserting pictures which do not help comprehensibility inside bar plots.
As can be seen in these two pictures, the elements added in the column bars do not help the comprehensibility at all. They do not represent, as they could be, neither the years nor the three analyzed variables in the second graph, where it could be much better displaying a picture related to the variable itself. Their only task is to distract the viewer in the first one with the help of a three‐dimensional object (see section 4.6) and in the second one from the non‐zero start of vertical axis (see section 4.5).
Source Beautiful Evidence by E. Tufte. Source Day Mines 1974 Annual Report.
In this picture instead a fancy left to right perspective effect has been introduced. Its effect on a better comprehensibility of the data is none, while it makes comparing the bars a very difficult task and gives the impression of a much larger increase.
Black columns represent New York State aid to localities in billions of USD, total column represent total budget, source New York Times 1 Feb 1976.
4.2. Elementsinthewrongplace
The position of elements must be planned with special care, as the viewer can induce that it contains a special meaning. In case it does not, this must be evident. For example, in one of the versions of the worldwide famous billion dollar gram, which depicts some of the world’s expenditures on a two dimensional scale. The graph is without any doubt very impressive and efficient in comparing some
Paolo Coletti Infographics course book
Page 20 of 23 Version 1.2 (08/02/2014)
expenses to others. However, apart from the impact on the viewer, there is no need to use two dimensions, one dimension is enough as the variable to be displayed is only one. The only advantage in using two dimensions is the possibility to easily slip insert some rectangles inside other ones. However, this automatically induce into the viewer the idea that a rectangle is part of the outside one, as it is correctly depicted for Online Advertising as part of Worldwide Advertising Spend. However, the viewer is expected to see the same for Iraq War, but the three rectangles are not in the total one, giving the impression of a larger expense.
Upper part of billion dollar gram by D. McCandless, source www.informationisbeautiful.net
4.3. Wrongproportionsbetweendataandsizes
Respecting proportions between numerical data and graphical elements’ sizes is one of the basic principles of any visual representation. In the billion dollar gram the proportion to be respected is between numerical data and areas, but the Facebook area, which correspond to 15 billion USD, should be less than one third of Bill Gates area, which evidentially is not as can be estimated with a simple ruler. The picture below shows instead a billion dollar gram where no evidently wrong proportion or no wrongly placed element seem to appear.
Infographic
Version 1.2
4.4. Diff
Data with avery clear oexamples bpicture. In b
s course boo
(08/02/2014
Upper par
ferentun
different unor unless thebelow the laboth cases a
ok
4)
rt of billion doll
nitsofme
nit of measurey are propest category rescaling wo
ar‐o‐gram by D
easure
re may neveerly rescaledcontains a sould have be
D. McCandless,
r be put oned. Apart fromshorter sameen very app
source www.in
e aside the otm evident uple’s unit, wropriate.
nformationisbe
ther, unless tnits of measwhich is evid
P
P
eautiful.net
their units osure’s differdent only in
Paolo Coletti
age 21 of 23
of measure isence, in thethe second
i
3
s e d
Paolo Colet
Page 22 of 2
Source Scien
4.5. Not
Starting theemphasize and possiblfrom zero wnumber witnegative vafirst column
4.6. Two
Whenever ain the propoif the horizosizes and aobviously p4.1, and in barrel intro
A three dimstandard comuch taller
Much subtlwater comphere is thatplace whereto have a voat any terre
ti
23
nce Indicators 1
tstarting
e axis from small differey underlinedwithout anythout any nealues, and thn of the seco
oandthr
a single scaleortions. The ontal base is reas. If alsoproportional the two pictduces an eve
Source Tim
mensional grolumn is exp and with a l
er is the gropared to Eart the vast mae water is neolume, and nestrial globe
1974, National S
gfromzer
a number ences (and ud. In the secoy notice norecessity, since three plot
ond plot is ne
reedime
e variable is viewer is fackept consta the horizonto the squatures belowen further m
me 9 April 1979
owth on an ploded into aarger volum
owth effect oth’s volume.ajority of waeeded. So it snot to the enimmediately
Science Founda
ro
which is nousing logaritond example number once the variabts have a diffegative.
nsionalg
representedced up with nt, it is possintal base of re of the da. In the first
misleading eff
9.
abstract elea three dimee than the tw
of the next s. The volumeater is scatteshould compntire Earth’s vy shows that
ation.
ot zero is a hmic scale ie of section 4n the verticables represenferent scale
growth
d using a twotwo differenible to respethe graphicta, as can bet one, the fafect, as the p
ement is dispensional objwo dimensio
second grape’s proportioered on 20 Kpared to the volume as wwater is far
Source
legitimate cs not appro4.1 the threeal axes, but nted by the fand zero lev
o dimensionnt scales: theect both the pcal element e seen in thct that the eperceived vo
Source W
played in theect which conal ones.
h, which depons are probKm of Earth’ssurface of E
water will nevfrom scarce
In
V
New York Time
choice whenpriate), but e column plothey start ffirst and thevel. This also
al image spee vertical sizeproportions is rescaled, e man in theelement is alume is the c
Washington Pos
e next picturlearly gives t
picts the volbably corrects surface departh, eventuver flow deeon Earth’s s
nfographics
Version 1.2 (0
es 8 August 197
never it is nit must be vots not only from a largee third plots o hides the f
ecial care mue and the areamong datathen the are first picturalso a three cube of the l
st 25 April 1978
re, where a the impress
lume of watt, but what ipth, which isually multipliply inside Easurface.
course book
08/02/2014)
78.
necessary tovery evidentdo not startely negativecannot havefact that the
ust be takenea size. Only and verticalea becomesre of sectiondimensionalinear size.
8.
column of aion of being
er and freshs misleadings exactly theed by 20 Kmarth. Looking
k
)
o t t e e e
n y l s n l
a g
h g e m g
Infographic
Version 1.2
Sou
s course boo
(08/02/2014
rce New York T
ok
4)
Times, 19 Decemmber 1978 Souurce www.8020
P
P
0vision.com
Paolo Coletti
age 23 of 23
i
3