381 Descriptive Statistics-IV (Measures of Variation) QSCI 381 – Lecture 6 (Larson and Farber,...

Preview:

Citation preview

381 Descriptive Statistics-IV

(Measures of Variation)

QSCI 381 – Lecture 6(Larson and Farber, Sect 2.4)

381

Deviation, Variance and Standard Deviation-I

The of a data entry xi in a population data set is the difference between xi and population mean , i.e.

The sum of the deviations over all entries is zero.

The is the sum of the squared deviations over all entries:

is the Greek letter sigma.

Deviation of i ix x

2

2

( )Population variance

ii

x

N

381

Deviation, Variance and Standard Deviation-II

The is the square root of the population variance, i.e.:

Note: these quantities relate to the population and not a sample from the population.

Note: sometimes the standard deviation is referred to as the standard error.

2

2

( )ii

x

N

381

The Sample variance and Standard Deviation

The and the of a data set with n entries are given by: 2

2

( )sample variance =

1

ii

x xs

n

2 2 2

2

( )sample standard deviation

1 1

i ii i

x x x nxs s

n n

Note the division by n -1 rather than N or n.

381

Calculating Standard Deviations

Step Population Sample

Find the mean

Find the deviation for each entry

Square each deviation

Add to get the sum of squares (SSx)

Divide by N or (n -1) to get the variance

Take the square root to get the standard deviation

ixxn

ix

N

ix xix

2( )ix x2( )ix

2( )x iSS x 2( )x iSS x x

2 /xSS N 2 /( 1)xs SS n

/xSS N /( 1)xs SS n

381

Example Find the standard deviation of the

following bowhead lengths (in m):(8.5, 8.4, 13.8, 9.3, 9.7)

Key question (before doing anything) – is this a sample or a population?

381

Formulae in EXCEL Calculating Means:

Average(“A1:A10”) Calculating Standard deviations:

Stdev(“A1:A10”) – this calculates the sample and not the population standard deviation!

381

Standard Deviations-I

0

10

20

30

40

50

60

-5 -3 -1 1 3 5 7 9 11 13 15 17 19 21 23 25

Data value

Fre

qu

en

cy

0

10

20

30

40

50

60

-5 -3 -1 1 3 5 7 9 11 13 15 17 19 21 23 25

Data value

Fre

qu

en

cy

0

10

20

30

40

50

60

-5 -3 -1 1 3 5 7 9 11 13 15 17 19 21 23 25

Data value

Fre

qu

en

cy

SD=0 SD=2.1

SD=5.3

381

Data values

-3SD -2SD -SD SD 2SD 3SD

Standard Deviations-II(Symmetric Bell-shaped distributions)

99.7%

95%

68%

34%

13.5%

Chebychev’s Theorem:The proportion of the data lying

within k standard deviations (k >1) of the mean is at least 1 - 1/k2

k = 2: proportion > 75%k = 3: proportion > 88%

381

Standard Deviations-III(Grouped data)

The standard deviation of a frequency distribution is:

Note: where the frequency distribution consists of bins that are ranges, xi should be the midpoint of bin i (be careful of the first and last bins).

2 2( ) ( )

1 1

i i i ii i

ii

x x f x x fs

n f

th

th

where is the value in the frequency distribution, and

is the frequency of the data entry.

i

i

x i

f i

381

Standard Deviations-IV(The shortcut formula)

2 2

2 2

2 2

2 2

2 2

1( )

1

1( 2 )

1

12

1

12 . . .

1

1.

1

ii

i ii

i ii i i

ii

ii

s x xn

x x x xn

x x x xn

x x n x n xn

x n xn

381

The Coefficient of Variation

The is the standard deviation divided by the mean - often expressed as a percentage.

The coefficient of variation is dimensionless and can be used to compare among data sets based on different units.

100s

CVx

381

Z-Scores The is calculated

using the equation:

value - mean

standard deviation

xz

381

Outliers-I Outliers can lead to mis-interpretation

of results. They can arise because of data errors (typing measurements in cm rather than in m) or because of unusual events.

There are several rules for identifying outliers: Outliers: < Q2-6(Q2-Q1); > Q2+6(Q3-Q2) Strays: < Q2-3(Q2-Q1); > Q2+3(Q3-Q2)

381

Outliers-II Strays and outliers should be

indicated on box and whisker plots: Consider the data set of bowhead

lengths, except that a length of 1 is added!

Length (cm)

0 50 100 15015105

Length (m)

381

Review of Symbols in this Lecture

2

2

population variance (the Greek letter sigma).

population standard deviation.

sample variance.

sample standard deviation.

s

s

381

Summary We use descriptive statistics to

“get a feel for the data” (also called “exploratory data analysis”). In general, we are using statistics from the sample to learn something about the population.

Recommended