Upload
jacob-harmon
View
221
Download
0
Embed Size (px)
Citation preview
381 Descriptive Statistics-IV
(Measures of Variation)
QSCI 381 – Lecture 6(Larson and Farber, Sect 2.4)
381
Deviation, Variance and Standard Deviation-I
The of a data entry xi in a population data set is the difference between xi and population mean , i.e.
The sum of the deviations over all entries is zero.
The is the sum of the squared deviations over all entries:
is the Greek letter sigma.
Deviation of i ix x
2
2
( )Population variance
ii
x
N
381
Deviation, Variance and Standard Deviation-II
The is the square root of the population variance, i.e.:
Note: these quantities relate to the population and not a sample from the population.
Note: sometimes the standard deviation is referred to as the standard error.
2
2
( )ii
x
N
381
The Sample variance and Standard Deviation
The and the of a data set with n entries are given by: 2
2
( )sample variance =
1
ii
x xs
n
2 2 2
2
( )sample standard deviation
1 1
i ii i
x x x nxs s
n n
Note the division by n -1 rather than N or n.
381
Calculating Standard Deviations
Step Population Sample
Find the mean
Find the deviation for each entry
Square each deviation
Add to get the sum of squares (SSx)
Divide by N or (n -1) to get the variance
Take the square root to get the standard deviation
ixxn
ix
N
ix xix
2( )ix x2( )ix
2( )x iSS x 2( )x iSS x x
2 /xSS N 2 /( 1)xs SS n
/xSS N /( 1)xs SS n
381
Example Find the standard deviation of the
following bowhead lengths (in m):(8.5, 8.4, 13.8, 9.3, 9.7)
Key question (before doing anything) – is this a sample or a population?
381
Formulae in EXCEL Calculating Means:
Average(“A1:A10”) Calculating Standard deviations:
Stdev(“A1:A10”) – this calculates the sample and not the population standard deviation!
381
Standard Deviations-I
0
10
20
30
40
50
60
-5 -3 -1 1 3 5 7 9 11 13 15 17 19 21 23 25
Data value
Fre
qu
en
cy
0
10
20
30
40
50
60
-5 -3 -1 1 3 5 7 9 11 13 15 17 19 21 23 25
Data value
Fre
qu
en
cy
0
10
20
30
40
50
60
-5 -3 -1 1 3 5 7 9 11 13 15 17 19 21 23 25
Data value
Fre
qu
en
cy
SD=0 SD=2.1
SD=5.3
381
Data values
-3SD -2SD -SD SD 2SD 3SD
Standard Deviations-II(Symmetric Bell-shaped distributions)
99.7%
95%
68%
34%
13.5%
Chebychev’s Theorem:The proportion of the data lying
within k standard deviations (k >1) of the mean is at least 1 - 1/k2
k = 2: proportion > 75%k = 3: proportion > 88%
381
Standard Deviations-III(Grouped data)
The standard deviation of a frequency distribution is:
Note: where the frequency distribution consists of bins that are ranges, xi should be the midpoint of bin i (be careful of the first and last bins).
2 2( ) ( )
1 1
i i i ii i
ii
x x f x x fs
n f
th
th
where is the value in the frequency distribution, and
is the frequency of the data entry.
i
i
x i
f i
381
Standard Deviations-IV(The shortcut formula)
2 2
2 2
2 2
2 2
2 2
1( )
1
1( 2 )
1
12
1
12 . . .
1
1.
1
ii
i ii
i ii i i
ii
ii
s x xn
x x x xn
x x x xn
x x n x n xn
x n xn
381
The Coefficient of Variation
The is the standard deviation divided by the mean - often expressed as a percentage.
The coefficient of variation is dimensionless and can be used to compare among data sets based on different units.
100s
CVx
381
Z-Scores The is calculated
using the equation:
value - mean
standard deviation
xz
381
Outliers-I Outliers can lead to mis-interpretation
of results. They can arise because of data errors (typing measurements in cm rather than in m) or because of unusual events.
There are several rules for identifying outliers: Outliers: < Q2-6(Q2-Q1); > Q2+6(Q3-Q2) Strays: < Q2-3(Q2-Q1); > Q2+3(Q3-Q2)
381
Outliers-II Strays and outliers should be
indicated on box and whisker plots: Consider the data set of bowhead
lengths, except that a length of 1 is added!
Length (cm)
0 50 100 15015105
Length (m)
381
Review of Symbols in this Lecture
2
2
population variance (the Greek letter sigma).
population standard deviation.
sample variance.
sample standard deviation.
s
s
381
Summary We use descriptive statistics to
“get a feel for the data” (also called “exploratory data analysis”). In general, we are using statistics from the sample to learn something about the population.