Upload
lawson-lewing
View
217
Download
3
Embed Size (px)
Citation preview
1
Measures of variation
Variability measures• In addition to locating the center of the observed
values of the variable in the data, another important aspect of a descriptive study of the variable is numerically measuring the extent of variation around the center. Two data sets of the same variable may exhibit similar positions of center but may be remarkably different with respect to variability.
• The variability measures should have the following characteristics:
- be minimum if all the value of the distribution are the same
-increase as increase the difference among the values of the distribution
2
Shops
Revenues Costs employee
place Director gender
ShopOn-line
R.O
1 350 205 5 city male yes 145
2 200 100 3 suburbs male yes 100
3 600 350 10 Near the city
female no 250
4 500 270 10 suburbs female no 230
5 270 200 6 city male no 70
6 180 120 3 city male no 60
7 205 105 3 suburbs male no 100
8 340 210 5 Near the city
female no 120
9 280 140 4 city female yes 140
3
Variability
revenue
350
200
600
500
270
180
205
340
280
revenue
(A)
revenue
(B)
revenue (C)
325 300 140
325 350 270
325 400 830
325 200 605
325 300 120
325 325 200
325 300 190
325 400 200
325 350 370
Observed distribution
Possible distribution
All the 3 possible distribution have the same mean of the observed one
325x
BUT the distribution are very different!!!
4
Some measures of variability
Range It is the width of the interval that contain all
the values of the distribution.
Interquartile rangeIt is the width of the interval that contain 50%
the values of the distribution.(central ones).
minmax xxrange
13 QQdQ
5
ExampleRevenue
350
200
600
500
270
180
205
340
280
Revenue
(A)Revenue
(B)Revenue
(C)
325 300 140
325 350 270
325 400 830
325 200 605
325 300 120
325 325 200
325 300 190
325 400 200
325 350 370
xmin180 325 200 120
xmax600 325 400 830
Range=xmax-xmin420 0 200 710
ANo VariabilityAll values are the same
From A to B and from B to C, the variability increasaes, the range is higher.
6
Deviation from the mean
The variance σ2 is function of the differences among each value xi and the mean
The sum of squared deviation is
n
11
2
i2 xx
n1
n
1i
2
i xx)X(Dev
02
x
7
The standard is the squared root of the variance
The coefficient of variation CV is the ratio between the standard dev. and the mean, multiplied 100
n
1i
2
i xxn1
100x
CV 0x
8
9
Example
Revenue
xj
Differences from mean
(xj-μ)
Squared differences
(xj-μ)2
350 25 625
200 -125 15625
600 275 75625
500 175 30625
270 -55 3025
180 -145 21025
205 -120 14400
340 15 225
280 -45 2025
x 325mean
0xxn
1ii
163200)X(Devxxn
1i
2
i
3,181339
163200n
)X(Devxx
n1 2
n
1i
2
i
7,1343,18133
xxn1 n
1i
2
i
Mean property
s.s.dev.=163200
Variance=18133,3
Std.Dev.=134,7
9
Variabilità dei ricavi dei punti vendita
• Un basso grado di variabilità indica che i punti vendita realizzano performance simili (i ricavi si discostano poco tra di loro)
• Viceversa un alto grado di variabilità fa capire che c’è una certa eterogeneità nei risultati delle vendite ottenuti nei diversi negozi
10
Variance from a frequency distribution
10,6988,54
nxxn1
j
K
1j
2
j2
Employee(xj)
Shops(nj)
3 2
4 1
6 3
7 1
10 2
(xj-μ)2*nj
19,34
4,45
0,04
0,79
30,26
11,6x
47,210,6 %43,4010011,647,2
CV
11
Standardised values
If a quantitative variable X as mean
and standard deviation σ, it is possible to obtain its standardised values
x
1...ni / xxy ii
The distribution of Y has zero mean and standard deviation equal to 1
Comparison among two founds (equal mean)
In last 5 years F1 and F2 had the same performance in mean, but variances are different Var(F1)>Var(F2)
F1 F2
2003 7,7 6,4
2004 6,1 5,9
2005 0,4 3,2
2006 9,8 7,1
2007 3,5 4,9
mean 5,5 5,5
var 10,7 1,8
Higher variability means that performance very different from the mean are more frequent. Higher volatility Higher risk
13
Comparison among the performance of two founds (different mean)
F1 has a mean and a variance higher than F2.
Can we say that F1 is an higher risk found than F2?
F1 F2
2003 9,7 1,4
2004 7,1 1,9
2005 0,9 2,2
2006 9,9 2,1
2007 7,5 4,9
media 7,0 2,5
var 10,6 1,5
CV 46,5 49,3
We have to compare the CV F1 has less variability
14