17
38 1 Descriptive Statistics-IV (Measures of Variation) QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)

381 Descriptive Statistics-IV (Measures of Variation) QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)

Embed Size (px)

Citation preview

Page 1: 381 Descriptive Statistics-IV (Measures of Variation) QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)

381 Descriptive Statistics-IV

(Measures of Variation)

QSCI 381 – Lecture 6(Larson and Farber, Sect 2.4)

Page 2: 381 Descriptive Statistics-IV (Measures of Variation) QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)

381

Deviation, Variance and Standard Deviation-I

The of a data entry xi in a population data set is the difference between xi and population mean , i.e.

The sum of the deviations over all entries is zero.

The is the sum of the squared deviations over all entries:

is the Greek letter sigma.

Deviation of i ix x

2

2

( )Population variance

ii

x

N

Page 3: 381 Descriptive Statistics-IV (Measures of Variation) QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)

381

Deviation, Variance and Standard Deviation-II

The is the square root of the population variance, i.e.:

Note: these quantities relate to the population and not a sample from the population.

Note: sometimes the standard deviation is referred to as the standard error.

2

2

( )ii

x

N

Page 4: 381 Descriptive Statistics-IV (Measures of Variation) QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)

381

The Sample variance and Standard Deviation

The and the of a data set with n entries are given by: 2

2

( )sample variance =

1

ii

x xs

n

2 2 2

2

( )sample standard deviation

1 1

i ii i

x x x nxs s

n n

Note the division by n -1 rather than N or n.

Page 5: 381 Descriptive Statistics-IV (Measures of Variation) QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)

381

Calculating Standard Deviations

Step Population Sample

Find the mean

Find the deviation for each entry

Square each deviation

Add to get the sum of squares (SSx)

Divide by N or (n -1) to get the variance

Take the square root to get the standard deviation

ixxn

ix

N

ix xix

2( )ix x2( )ix

2( )x iSS x 2( )x iSS x x

2 /xSS N 2 /( 1)xs SS n

/xSS N /( 1)xs SS n

Page 6: 381 Descriptive Statistics-IV (Measures of Variation) QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)

381

Example Find the standard deviation of the

following bowhead lengths (in m):(8.5, 8.4, 13.8, 9.3, 9.7)

Key question (before doing anything) – is this a sample or a population?

Page 7: 381 Descriptive Statistics-IV (Measures of Variation) QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)

381

Formulae in EXCEL Calculating Means:

Average(“A1:A10”) Calculating Standard deviations:

Stdev(“A1:A10”) – this calculates the sample and not the population standard deviation!

Page 8: 381 Descriptive Statistics-IV (Measures of Variation) QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)

381

Standard Deviations-I

0

10

20

30

40

50

60

-5 -3 -1 1 3 5 7 9 11 13 15 17 19 21 23 25

Data value

Fre

qu

en

cy

0

10

20

30

40

50

60

-5 -3 -1 1 3 5 7 9 11 13 15 17 19 21 23 25

Data value

Fre

qu

en

cy

0

10

20

30

40

50

60

-5 -3 -1 1 3 5 7 9 11 13 15 17 19 21 23 25

Data value

Fre

qu

en

cy

SD=0 SD=2.1

SD=5.3

Page 9: 381 Descriptive Statistics-IV (Measures of Variation) QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)

381

Data values

-3SD -2SD -SD SD 2SD 3SD

Standard Deviations-II(Symmetric Bell-shaped distributions)

99.7%

95%

68%

34%

13.5%

Chebychev’s Theorem:The proportion of the data lying

within k standard deviations (k >1) of the mean is at least 1 - 1/k2

k = 2: proportion > 75%k = 3: proportion > 88%

Page 10: 381 Descriptive Statistics-IV (Measures of Variation) QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)

381

Standard Deviations-III(Grouped data)

The standard deviation of a frequency distribution is:

Note: where the frequency distribution consists of bins that are ranges, xi should be the midpoint of bin i (be careful of the first and last bins).

2 2( ) ( )

1 1

i i i ii i

ii

x x f x x fs

n f

th

th

where is the value in the frequency distribution, and

is the frequency of the data entry.

i

i

x i

f i

Page 11: 381 Descriptive Statistics-IV (Measures of Variation) QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)

381

Standard Deviations-IV(The shortcut formula)

2 2

2 2

2 2

2 2

2 2

1( )

1

1( 2 )

1

12

1

12 . . .

1

1.

1

ii

i ii

i ii i i

ii

ii

s x xn

x x x xn

x x x xn

x x n x n xn

x n xn

Page 12: 381 Descriptive Statistics-IV (Measures of Variation) QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)

381

The Coefficient of Variation

The is the standard deviation divided by the mean - often expressed as a percentage.

The coefficient of variation is dimensionless and can be used to compare among data sets based on different units.

100s

CVx

Page 13: 381 Descriptive Statistics-IV (Measures of Variation) QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)

381

Z-Scores The is calculated

using the equation:

value - mean

standard deviation

xz

Page 14: 381 Descriptive Statistics-IV (Measures of Variation) QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)

381

Outliers-I Outliers can lead to mis-interpretation

of results. They can arise because of data errors (typing measurements in cm rather than in m) or because of unusual events.

There are several rules for identifying outliers: Outliers: < Q2-6(Q2-Q1); > Q2+6(Q3-Q2) Strays: < Q2-3(Q2-Q1); > Q2+3(Q3-Q2)

Page 15: 381 Descriptive Statistics-IV (Measures of Variation) QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)

381

Outliers-II Strays and outliers should be

indicated on box and whisker plots: Consider the data set of bowhead

lengths, except that a length of 1 is added!

Length (cm)

0 50 100 15015105

Length (m)

Page 16: 381 Descriptive Statistics-IV (Measures of Variation) QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)

381

Review of Symbols in this Lecture

2

2

population variance (the Greek letter sigma).

population standard deviation.

sample variance.

sample standard deviation.

s

s

Page 17: 381 Descriptive Statistics-IV (Measures of Variation) QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)

381

Summary We use descriptive statistics to

“get a feel for the data” (also called “exploratory data analysis”). In general, we are using statistics from the sample to learn something about the population.