Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Data Visualisation & InterpretationThe art of reading datasets
Devert AlexandreSchool of Software Engineering of USTC
14 February 2012 — Slide 1/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Table of Contents
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 2/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Descriptive statistics
descriptive statistics helps to give a general summary ofdata
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 3/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Mean
Example of descriptive statistics quantity
arithmetic mean
a =1
n
n∑i=1
ai
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 4/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Mean
Example of descriptive statistics quantity
arithmetic mean
a =1
n(a1 + a2 + · · ·+ an)
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 4/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Mean
The mean is defined in Rn ⇒ geometric center
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 5/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Mean computation
You think, it is easy to compute the mean ?
0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 6/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Mean computation
A naive summation algorithm will return this
>>> 0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.10.8999999999999999
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 7/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Mean computation
An accurate summation algorithm will return this
>>> impor t math>>> math . fsum (0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1)0 .9
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 8/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Mean computation
Algorithms like Kahan summation algorithm or Shewchuksummation algorithm reduces the numerical error
de f KahanSum( data ) :s = 0 .0c = 0 .0f o r i i n range ( l e n ( data ) ) :
y = data [ i ] − ct = s + yc = ( t − s ) − ys = t
r e t u r n s
Listing 1: Kahan summation
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 9/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Central tendencyThe mean is a measure of central tendency ⇒ the mainbehaviour, the main value of some phenomenon
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 10/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Central tendencyThe mean is a measure of central tendency ⇒ the mainbehaviour, the main value of some phenomenon
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 10/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Mean robustnessThe mean is not a robust estimator of the centraltendency
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 11/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Median
The median is the value such as 50% of the values arehigher, 50% of the values are lower
a = [6, 1, 7, 9, 6, 3, 4, 5, 2]
a = [1, 2, 3, 4, 5, 6, 6, 7, 9]
a = 5
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 12/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Median
The median is the value such as 50% of the values arehigher, 50% of the values are lower
a = [6, 1, 7, 9, 6, 3, 4, 8, 5, 2]
a = [1, 2, 3, 4, 5, 6, 6, 7, 8, 9]
a =1
2(5 + 8) = 6.5
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 12/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Median computation
To compute the median, you can
1 sort the list of samples
2 • if size is odd → a = a n+12
• if size is even → a = 12(a n
2+ a n+1
2)
Note that it is for indexes starting from 1
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 13/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Median computation
Let’s code some python
de f median ( data ) :data . s o r t ( )i f l e n ( data ) % 2 == 0 :m = l e n ( data ) / 2r e t u r n 0 .5 ∗ ( data [m−1] + data [m] )
e l s e :r e t u r n data [ ( l e n ( data ) − 1) / 2 ]
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 14/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Median computation
Let’s code some python
>>> a =[6 , 1 , 7 , 9 , 6 , 3 , 4 , 5 , 2 ]>>> median ( a )5
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 14/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Median computationThe median have an equivalent in Rn ⇒ median center
Compute the median for each dimension to get themedian center
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 15/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Median robustness
The median is a more robust estimator of the centraltendency
• green is the median
• pink is the arithmeticmean
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 16/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Statistical dispersionThe following datasets have the same central tendency
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 17/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Statistical dispersionThe following datasets have the same central tendency
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 17/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Statistical dispersionBut they have different dispersions
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 18/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Standard deviation
A traditional measure of dispersion is the standarddeviation sigma
σ2 =1
n − 1
N∑i=1
(ai − a)2
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 19/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Standard deviation computation
Robust computation of the standard deviation ⇒Knuth-Welford algorithm
de f stdDev ( data ) :n = 0mean = 0M2 = 0meanEst imate = math . fsum ( data ) / l e n ( data )
f o r x i n data :y = x − meanEst imaten = n + 1d e l t a = y − meanmean = mean + d e l t a / nM2 = M2 + d e l t a ∗ ( y − mean )
r e t u r n math . s q r t (M2 / ( n − 1) )
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 20/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Standard deviation
Standard deviation suffers from the same robustnessissues as mean. We will look why, later.
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 21/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Quartiles
The lower quartile or first quartile is the value such as75% of the values are higher, 25% of the values are lower
a = [6, 1, 2, 7, 9, 6, 3, 4, 5, 2, 6]
a = [1, 2, 2, 3, 4, 5, 6, 6, 6, 7, 9]
q1 = 2
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 22/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Quartiles
The higher quartile or third quartile is the value such as25% of the values are higher, 75% of the values are lower
a = [6, 1, 7, 9, 6, 3, 4, 5, 2, 6]
a = [1, 2, 2, 3, 4, 5, 6, 6, 6, 7, 9]
q3 = 6
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 22/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Quartiles
Where is the second quartile ? ⇒ it’s the median
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 23/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Interquartile range
The difference Q3− Q1 is the interquartile range or IQR⇒ it’s a more robust dispersion measure
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 24/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
normal distributionA model for random variables, with 2 parameters µ and σ
−6 −4 −2 0 2 4 60.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 25/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
normal distribution
The normal distributions have 2 parameters µ and σ.
Φ(x) =1√
2πσ2e
−(x−µ)2
2σ2
This is the probability density of the normal distribution.
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 26/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
normal distribution
The normal distributions have 2 parameters µ and σ.
Φ(x) =1√
2πσ2e
−(x−µ)2
2σ2
It tells the probability for x to appear, according to thisdistribution.
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 26/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
normal distributionµ is the mode, the central tendency of the normaldistribution
−6 −4 −2 0 2 4 60.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 27/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
normal distribution
If some data are following a normal distribution, then
µ = a
The more sample, the more ”true“ it will be
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 28/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
normal distributionσ controls the shape of the normal distribution
−6 −4 −2 0 2 4 60.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 29/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
normal distribution
If some data are following a normal distribution
σ2 =1
n − 1
N∑i=1
(ai − a)2
The standard deviation comes from here ⇒ dispersion ofa normal distribution
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 30/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
normal distributionµ and σ are completely independent parameters
−6 −4 −2 0 2 4 60.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 31/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
normal distribution
Practical interpretation of the normal distribution0.0
0.1
0.2
0.3
0.4
−2σ −1σ 1σ−3σ 3σµ 2σ
34.1% 34.1%
13.6%2.1%
13.6% 0.1%0.1%2.1%
68% of the values within [µ− σ, µ + σ]
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 32/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
normal distribution
Practical interpretation of the normal distribution0.0
0.1
0.2
0.3
0.4
−2σ −1σ 1σ−3σ 3σµ 2σ
34.1% 34.1%
13.6%2.1%
13.6% 0.1%0.1%2.1%
95% of the values within [µ− 2σ, µ + 2σ]
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 32/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
normal distribution
Practical interpretation of the normal distribution0.0
0.1
0.2
0.3
0.4
−2σ −1σ 1σ−3σ 3σµ 2σ
34.1% 34.1%
13.6%2.1%
13.6% 0.1%0.1%2.1%
99.7% of the values within [µ− 3σ, µ + 3σ]
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 32/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
skewed distributions
Your data might not have a symmetric distribution ⇒they might have a skewed distribution
0.0 0.5 1.0 1.5 2.0 2.5 3.00.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
• red is the true centraltendency
• green is the median
• pink is the arithmeticmean
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 33/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
skewed distributions
Your data might not have a symmetric distribution ⇒they might have a skewed distribution
0.0 0.5 1.0 1.5 2.0 2.5 3.00.0
0.2
0.4
0.6
0.8
1.0
• red is the true centraltendency
• green is the median
• pink is the arithmeticmean
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 33/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
skewed distributions
Your data might not have a symmetric distribution ⇒they might have a skewed distribution
0.0 0.5 1.0 1.5 2.0 2.5 3.00.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
• red is the true centraltendency
• green is the median
• pink is the arithmeticmean
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 33/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
skewed distributions
You can compute the skewness of your data
1n
∑ni=1(ai − a)3(
1n
∑ni=1(ai − a)2
) 32
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 34/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
multimodal distributionsYour data might have multiple modes
−3 −2 −1 0 1 2 3 4 50.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 35/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
multimodal distributionsIn such case, the mean, median and other descriptivequantities might have no reliable meaning
−3 −2 −1 0 1 2 3 4 50.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 36/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
multimodal distributionsIn such case, the mean, median and other descriptivequantities might have no reliable meaning
−3 −2 −1 0 1 2 3 4 50.0
0.2
0.4
0.6
0.8
1.0
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 36/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
multimodal distributionsIn such case, the mean, median and other descriptivequantities might have no reliable meaning
−3 −2 −1 0 1 2 3 4 50.0
0.2
0.4
0.6
0.8
1.0
1.2
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 36/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
multimodal distributionsIn such case, the mean, median and other descriptivequantities might have no reliable meaning
−3 −2 −1 0 1 2 3 4 50.0
0.2
0.4
0.6
0.8
1.0
1.2
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 36/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Table of Contents
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 37/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Observe your data
Descriptive statistics can completely miss importantinformations from your data !
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 38/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Observe your dataThe Anscombe’s quartet
4
8
12
0 10 20
4
8
12
0 10 20
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 39/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Observe your data
Those 4 datasets have exactly the same
• mean
• variance
• regression line
But they are not quite the same things !
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 40/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
BoxplotA nice way to summarize data distribution is the boxplot
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 41/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
BoxplotA nice way to summarize data distribution is the boxplot
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 41/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
BoxplotA nice way to summarize data distribution is the boxplot
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 41/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Boxplot
The red mark shows the mean
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 42/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Boxplot
The box goes from the lower quartile to the upperquartile
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 42/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Boxplot
The box is thus centred on the median
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 42/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Boxplot
The whiskers are the minimum and maximum values
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 42/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Boxplot
Outliers values are shown as blue crosses
Outliers are values which are beyond 1.5× IQR from thequartiles
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 43/1
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC
Scatter plotA scatter plot is simply a plot with the data as pointsalong 2 dimensions
−3 −2 −1 0 1 2 3−5
−4
−3
−2
−1
0
1
2
3
4
Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 44/1