MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set...

MEASURES OF

VARIABILITY

Variance and Standard Deviation

Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency.

Measures of variability or dispersion are used to describe the spread, or variability, of a distribution

Common measures of dispersion: range, variance, and standard deviation

Measures of Variability or

Dispersion

Why Study Dispersion?

For example, if your nature guide told you that

the river ahead averaged 3 feet in depth,

would you want to wade across on foot without

additional information? Probably not. You

would want to know something about the

variation in the depth.

A second reason for studying the dispersion

in a set of data is to compare the spread in two

or more distributions.

Observe two hypothetical

data sets:

The average value provides

a good representation of the

observations in the data set.

Small variability

This data set is now

changing to...

Observe two hypothetical

data sets:

The average value provides

a good representation of the

observations in the data set.

Small variability

Larger variability

The same average value does not

provide as good representation of the

observations in the data set as before.

Samples of Dispersions

Measures of Dispersion

Mean Deviation

Variance and

Standard Deviation

EXAMPLE – Range

The number of cappuccinos sold at the Starbucks location in the Orange Country Airport between 4 and 7 p.m. for a sample of 5 days last year were 20, 40, 50, 60, and 80. Determine the mean deviation for the number of cappuccinos sold.

Range = Largest – Smallest value

= 80 – 20 = 60

range H L

Range: The difference in value between the highest-

valued (H) and the lowest-valued (L) pieces of data:

Example: Consider the sample {12, 23, 17, 15, 18}.

Find the range

Solutions:

range H L 23 12 11

Skor (X)

Kekerapan (f )

N = f = 20

fX= 1424

Ungrouped Data

Range for Ungrouped Data

R = H – L H = highest value

L = lowest value

R = 89 – 50 = 39

Range for Ungrouped Data

For Grouped Data Class

Interval

Class Limit Class Mid-

point (m)

Frequency (less

than UCL) (f)

50 – 54

49.5 – 54.5

55 – 59

54.5 – 59.5

60 – 64

59.5 – 64.5

65 – 69

64.5 – 69.5

70 – 74

69.5 – 74.5

75 – 79

74.5 – 79.5

80 – 85

79.5 – 84.5

85 – 89

84.5 – 89.5

N = f = 20

Range for Grouped Data

For Grouped Data in a Frequency Distribution

Range = UCL of Highest Class – LCL of Lowest Class

Range = 89.5 – 49.5 = 40

Range for Grouped Data

The range of a set of observations is the

difference between the largest and

smallest observations.

Its major advantage is the ease with which it can be computed.

Its major shortcoming is its failure to

provide information on the dispersion of the

observations between the two end points.

But, how do all the observations spread out?

Smallest

observation

Largest

observation

The range cannot assist in answering this question Range

Semi Inter-Quartile Range

There are three quartiles:

First Quartile, Q1, at the 25% frequency

Second Quartile, Q2, at the 50% frequency, equals the median

Third Quartile, Q3, at the 75% frequency

Semi Inter-Quartile Range, Q = ( Q3 - Q1 )/2

Q = (77.0 – 65.5) = 5.75

• Other measures of dispersion are based on the following

quantity

Deviation from the Mean: A deviation from the

mean, ,is the difference between the value of x and

the mean .

Deviation from the Mean

Example: Consider the sample {12, 23, 17, 15, 18}.

Find each deviation from the mean.

Solutions:

Data Deviation from Mean

_________________________

512 23 17 15 18 17( )

Deviation from the Mean

Consider two small populations:

10 9 8

7 4 10

8-10= -2

9-10= -1

11-10= +1

12-10= +2

4-10 = - 6

7-10 = -3

13-10 = +3

16-10 = +6

Sum = 0

The mean of both

populations is 10...

…but measurements in B

are more dispersed

then those in A.

A measure of dispersion

Should agrees with this

observation.

Can the sum of deviations

Be a good measure of dispersion?

The sum of deviations is

zero for both populations,

therefore, is not a good

measure of dispersion.

Why not use the sum of deviations

from the mean?

Mean Deviation

The number of cappuccinos sold at the Starbucks location in the Orange Country Airport between 4 and 7 p.m. for a sample of 5 days last year were 20, 40, 50, 60, and 80. Determine the mean deviation for the number of cappuccinos sold.

This measure reflects the dispersion of all the

observations

The variance of a population of size N x1, x2,…,xN

whose mean is m is defined as

The variance of a sample of n observations

x1, x2, …,xn whose mean is is defined as x

)x( 2i

N1i2 m

The Variance

Population Variance, 2 = (x - m)2 ,

m = population mean

N = population size

Population Standard Deviation, = 2

Population Variance

Sample Variance: The sample variance, s2, is the mean

of the squared deviations, calculated using n 1 as the

divisor:

x x 2 2 1

( ) where n is the sample size

SS( ) ( ) SS ( ) ( ) x x x x

n x 2 2 2 1

Note: The numerator for the sample variance is called the

sum of squares for x, denoted SS(x):

Sample Variance

Let us calculate the variance of the two populations

)1016()1013()1010()107()104( 222222B

)1012()1011()1010()109()108( 222222A

Why is the variance defined as

the average squared deviation?

Why not use the sum of squared

deviations as a measure of

variation instead?

After all, the sum of squared

deviations increases in

magnitude when the variation

of a data set increases!!

Population Variance – Example

Example:

The following sample consists of the number of jobs six students applied for: 17, 15, 23, 7, 9, 13. Finds its mean and variance

Solution:

jobs2.33

)1413...()1415()1417(16

jobs146

1397231517

Sample Variance – Example

The shortcut formula for the sample variance:

Sample Variance – Shortcut

Formula

( ) ( )

jobs2.33

13...151713...1517

Sample Variance – Shortcut

Formula

Sample Variance – Example 2

The hourly wages for a sample of part-time employees at Home Depot are: $12, $20, $16, $18, and $19. What is the sample variance?

The standard deviation of a set of

observations is the positive square root of

the variance .

:deviationandardstPopulation

ss:deviationstandardSample

The unit of measure for the standard deviation

is the same as the unit of measure for the data

Standard Deviation

Example: Find the (1) variance and (2) standard

deviation for the data {5, 7, 1, 3, 8}:

2 . 8 ) 8 . 32 ( 4

1 2 s 1) 86 . 2 2 . 8 s 2)

x x x ( ) x x 2

24 0 32.08 Sum:

Solutions:

5 7 1 3 8 4 8 ( ) . First:

Standard Deviation

The number of traffic citations issued during the last

five months in Beaufort County, South Carolina, is 38,

26, 13, 41, and 22. What is the population variance?

EXAMPLE – Variance and

Standard Deviation

EXAMPLE – Sample Variance

The hourly

wages for a

sample of part-

time employees

at Home Depot

are: $12, $20,

$16, $18, and

$19. What is

the sample

variance?

If the data is given in the form of a

frequency distribution, we need to make

a few changes to the formulas for the

mean, variance, and standard deviation

Complete the extension table in order to

find these summary statistics

Calculation of Mean, Variance and

Standard Deviation

Population variance, 2 =

Sample variance, s2 =

Population standard deviation, =

Sample standard deviation, s =

Variance and Standard Deviation

for Ungrouped Data

X = 1 424

X2 = 103 120

X2 = 103120; (X)2 = 2 027 776

= 86.56

= = 9.30

2 027 776103120

For Ungrouped Data

Example: A survey of students in the first grade at a local school asked for the number of brothers and/or sisters for each child. The results are summarized in the table below. Find (1) the mean, (2) the variance, and (3) the standard deviation:

2239 93

6262 1

.2) s 163 128. .3) x 93 62 15/ .1)

Solutions:

62 93 Sum:

15 0 0

x f xf x f2

For a Frequency Distribution

Standard Deviation of Grouped

Data - Example Refer to the frequency distribution for the Whitner

Autoplex data used earlier. Compute the standard deviation of the vehicle selling prices

The standard deviation can be used to

compare the variability of several distributions

make a statement about the general shape of a distribution.

The empirical rule: If a sample of observations has a mound-shaped distribution, the interval

tsmeasuremen the of 68%ely approximat contains )sx,sx(

tsmeasuremen the of 95%ely approximat contains )s2x,s2x(

tsmeasuremen the of 99.7%ely approximat contains )s3x,s3x(

Interpreting Standard Deviation

Standard deviation is a measure of

variability, or spread

Two rules for describing data rely on the

standard deviation:

Empirical rule: applies to a variable that is

normally distributed

Chebyshev’s theorem: applies to any

distribution

Interpreting Standard Deviation

The Empirical Rule

Notes:

The empirical rule is more informative than Chebyshev’s theorem since we know more about the distribution (normally distributed)

Also applies to populations

Can be used to determine if a distribution is normally distributed

1. Approximately 68% of the observations lie within 1

standard deviation of the mean

2. Approximately 95% of the observations lie within 2

standard deviations of the mean

3. Approximately 99.7% of the observations lie within 3

standard deviations of the mean

Empirical Rule: If a variable is normally distributed, then:

The Empirical Rule

xx s x sx s2 x s2x s3 x s3

The Empirical Rule

1) What percentage of weights fall between 5.7 and 7.3?

2) What percentage of weights fall above 7.7?

Example: A random sample of plum tomatoes was selected from a local grocery store and their weights recorded. The mean weight was 6.5 ounces with a standard deviation of 0.4 ounces. If the weights are normally distributed:

Solutions:

( , ) ( . (0. ), . (0. )) ( . , . ) x s x s 2 2 6 5 2 4 6 5 2 4 5 7 7 3 Approximat ely 95% of the weigh ts fall be tween 5.7 and 7.3

( , ) ( . (0. ), . (0. )) ( . , . ) x s x s 3 3 6 5 3 4 6 5 3 4 5 3 7 7 Approximat ely 99.7% of the wei ghts fall between 5. 3 and 7.7 Approximat ely 0.3% of the weigh ts fall out side (5.3, 7.7) Approximat ely (0.3 / 2) =0.15% of t he weights fall abov e 7.7

Empirical Rule – Example

1. Find the mean and standard deviation for the data

2. Compute the actual proportion of data within 1, 2, and 3 standard deviations from the mean

3. Compare these actual proportions with those given by the empirical rule

4. If the proportions found are reasonably close to those of the empirical rule, then the data is approximately normally distributed

Note: The empirical rule may be used to determine whether or not a set of data is approximately normally distributed

Empirical Rule

Example: A statistics practitioner wants

to describe the way returns on

investment are distributed.

The mean return = 10%

The standard deviation of the return = 8%

The histogram is bell shaped.

More Examples

Example: – solution

The empirical rule can be applied (bell shaped histogram)

Describing the return distribution Approximately 68% of the returns lie between 2% and

18% [10 – 1(8), 10 + 1(8)]

Approximately 95% of the returns lie between -6% and 26% [10 – 2(8), 10 + 2(8)]

Approximately 99.7% of the returns lie between -14% and 34% [10 – 3(8), 10 + 3(8)]

More Examples

Chebyshev’s Theorem

The arithmetic mean biweekly amount contributed by the Dupree Paint employees to the company’s profit-sharing plan is $51.54, and the standard deviation is $7.51. At least what percent of the contributions lie within plus 3.5 standard deviations and minus 3.5 standard deviations of the mean?

Chebyshev’s Theorem: The proportion of any distribution that lies within k standard deviations of the mean is at least 1 (1/k2), where k is any positive number larger than 1. This theorem applies to all distributions of data.

at least

xx ks x ks

Illustration:

The proportion of observations in any sample that lie within

k standard deviations of the mean is at least

1-1/k2 for k > 1.

This theorem is valid for any set of measurements (sample,

population) of any shape!!

K Interval Chebyshev Empirical Rule

1 at least 0% approximately 68%

2 at least 75% approximately 95%

3 at least 89% approximately 99.7%

s2x,s2x

s3x,s3x

(1-1/12)

(1-1/22)

(1-1/32)

Chebyshev’s theorem is very conservative and

holds for any distribution of data

Chebyshev’s theorem also applies to any

population

The two most common values used to describe a

distribution of data are k = 2, 3

1.7 2 2.5 3

0.65 0.75 0.84 0.89

1 1 2( / )k

The table below lists some values for k and 1 - (1/k2):

Example: At the close of trading, a random sample of 35 technology stocks was selected. The mean selling price was 67.75 and the standard deviation was 12.3. Use Chebyshev’s theorem (with k = 2, 3) to describe the distribution.

)104.65 ,85.30()3.12(375.67 ),3.12(375.67()3 ,3( sxsx

Using k=3: At least 89% of the observations lie within 3

standard deviations of the mean:

Solutions:

)35.92 ,15.43()3.12(275.67 ),3.12(275.67()2 ,2( sxsx

Using k=2: At least 75% of the observations lie within 2 standard

deviations of the mean:

Chebyshev’s Theorem – Example

Example: The annual salaries of the employees of a chain of computer

stores produced a positively skewed histogram. The mean and

standard deviation are $28,000 and $3,000, respectively. What

can you say about the salaries at this chain?

Solution:

At least 75% of the salaries lie between $22,000 and $34,000

28000 – 2(3000) 28000 + 2(3000)

At least 88.9% of the salaries lie between $19,000 and $37,000

28000 – 3(3000) 28000 + 3(3000)

Chebyshev’s Theorem – Example

Normal Distribution

median

Frequency

Normal Distribution

Frequency

Normal Distribution

MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set...

Documents

MEASURES OF CENTRAL TENDENCY AND DISPERSION · MEASURES OF CENTRAL TENDENCY AND DISPERSION CHAPTER 15 UNIT I: MEASURES OF CENTRAL TENDENCY After reading this chapter, students will

MEASURES OF CENTRAL TENDENCY - …carlprosper4nugs.yolasite.com/resources/Stats notes...Definition-measures of central tendency •Central tendency is a statistical measure that identifies

Measures of central tendency

Chapter 3.4 Measures of Central Tendency Measures of Central Tendency

Measures of Central Tendency

Chapter 3 Measures of Central Tendency. 3.1 Defining Central Tendency Central tendency Purpose:

Measures of Central Tendency Measures of Location Measures

Chapter 3 Descriptive Measures Measures Central Tendency MeanMedianModeDispersionRange Variance & Standard Deviation Measures Central Tendency MeanMedianModeDispersionRange

Measures of Central Tendency. What Are Measures of Central Tendency?

MEASURES OF CENTRAL TENDENCY OR AVERAGES · MEASURES OF CENTRAL TENDENCY OR AVERAGES MEASURES OF CENTRAL TENDENCY OR LOCATION A single expression, representing the whole group, is

Elementary Statistics Measures of Central tendency …srinagarwomenscollege.com/ebmedia/sitename_eb/wp-content/upload… · Measures of Central tendency In statistics, a central tendency

MEASURES OF CENTRAL TENDENCYoms.bdu.ac.in/ec/admin/contents/427_16CCCCM8... · central tendency and measures put forward to measure these tendency are called measures of central tendency

Basic concept Measures of central tendency Measures of central tendency Measures of dispersion & variability

Measures of Central Tendency - chromeias.com€¦ · 1. Measures of Central Tendency 2. Measures of Dispersion 3. Measures of Relationship While measures of central tendency provide