Actual Dispersion

Measures of Dispersion

Defination

While measures of central tendency indicate what value of a variable is (in one sense or other) “average” or “central” or “typical” in a set of data, measures of dispersion (or variability or spread) indicate (in one sense or other) the extent to which the observed values are “spread out” around that center — how “far apart” observed values typically are from each other or from some average value (in particular, the mean). Thus: if all cases have identical observed values (and thereby are also

identical to [any] average value), dispersion is zero; if most cases have observed values that are quite “close

together” (and thereby are also quite “close” to the average value), dispersion is low (but greater than zero); and

if many cases have observed values that are quite “far away” from many others (or from the average value), dispersion is high.

Measures of Dispersion

Synonym for variability Often called “spread” or “scatter” Indicator of consistency among a

data set Indicates how close data are

clustered about a measure of central tendency

Example Consider the following data related to age

distribution of two groups A and B:

avg

Grp A 22 24 25 26 28 25

Grp B 8 15 20 28 54 25

Above mentioned two groups have the same average i.e. 25 years, so we are likely to conclude that the two groups are similar.

Wrong conclusion as the obs. in group A are close to one another indicating that people in this group are more or less of the age 22 to 28 years.

While those in group B are widely dissimilar and have greater variability of ages as it includes a person who is 8 years old on one hand and a person of age 54 on the other hand.

This means that central value does not give the clear indication of the pattern of distribution.

Measure of dispersion or variability gives us the information about the spread of the obs. In one distribution

Here, dispersion of group B is more than that of group A

Purpose of Measuring Variation To test the reliability of an average To serve as a basis for control of

variability To compare two or more series with

regard to variability To facilitate as a basis for further

statistical analysis.

Properties of a good measure of variation

It should be simple to understand and easy to calculate. It should be based on all observations. It should be amenable to further algebraic treatment. It should not be affected by extreme observations.

Measures of variation Absolute measures Range Quartile deviation Mean deviation Standard deviation / variance Lorenz curve Relative measures Coefficient of range Coefficient of variation Coefficient of quartile deviation Coefficient of mean deviation

Absolute measures of variation They are expressed in the same statistical unit in

which the original data are given such as rupees, kg etc.

These values are used to compare the variation in two or more than two distributions provided the variables are expressed in the same units and have almost the same average value.

Relative measures of variation Absolute measure of dispersion expresses

variation in the same units as the original data

To compare the variations of two different series, relative measure of standard deviation is calculated.

Range Range is the preliminary indicator of dispersion. The (“total” or “simple”) range is the maximum

(highest) value observed in the data [the value of the case at the 100th percentile] minus the minimum (lowest) value observed in the data [the value of the case at the 0th percentile] That is, it is the “distance” or “interval” between the

values of these two most extreme cases. Indicates how spread out the data are. Open-ended distributions have no range bz no

highest or lowest values exist in an open-ended class.

The Range The range is defined as the difference

between the largest score in the set of data and the smallest score in the set of data, XL - XS

What is the range of the following data:4 8 1 6 6 2 9 3 6 9

The largest score (XL) is 9; the smallest score (XS) is 1; the range is XL - XS = 9 - 1 = 8

Coefficient of scatter Ratio of range Coefficient of range =(Max- Min )/ (Max +

Min) = Absolute range / Sum of the extreme values

Dispersion Example

Number of minutes 20 clients waited to see a consultant

Consultant X Y 05 15 11 12 12 03 10 13 04 19 11 10 37 11 09 13 06 34 09 11

Consultant X: Sees some clients almost

immediately Others wait over 1/2 hour Highly inconsistent

Consultant Y: Clients wait about 10

minutes 9 minutes least wait and

13 minutes most Highly consistent

Solution 1.Coefficient of range =(Max- Min )/ (Max + Min) = (37- 03 )/ (37 + 03) = 34/40 = 0.852. Coefficient of range =(Max- Min )/ (Max + Min) = (13- 09 )/ (13 + 09) = 4/22 = 0.18Consultant X is inconsistent and Consultant Y is consistent in

their job.

Uses QUALITY CONTROL:

The objective of quality control is to keep a check on the quality of the product without 100% inspection

When statistical methods of quality control are used, control charts are prepared in which range plays an important role.

The basic idea is that as long as manufactured products conform to set standards (range), the production process is assumed to be in control.

WHEATHER FORECASTS: This helps the general public to know as to what limits

the temperature is likely to vary on a particular day.

Quartile deviation It measures the distance between the

lowest and highest of the middle 50 percent of the scores of distribution.

Q.D. is superior to range, as it is not based on two extreme values but rather on middle 50% observation.

It can be calculated from open-ended classes.

It is often used with skewed data as it is insensitive to the extreme scores

Interquartile Range

Interquartile range = Q3 – Q1

Semi-interquartile range or quartile deviation is defined as

= (Q3 – Q1 )/2

Coefficient of quartile deviation is = = (Q3 – Q1 )/(Q3 + Q1)

When Q.D. is small then it describes high uniformity of central 50% observations.

High Q.D. means high variation among the central observations.

Interquartile Range Example

The number of complaints received by the manager of a supermarket was recorded for each of the last 10 working days.

21, 15, 18, 5, 10, 17, 21, 19, 25 & 28 Sorted data

5, 10, 15, 17, 18, 19, 21, 21, 25 & 28

nObservatioorQ

Q

nQ

rd375.2

4

114

1

1

1

1

nObservatioorQ

Q

nQ

th825.8

4

334

13

3

3

3

Interquartile range = 21 – 15 = 6 days

Calculating exactly:Q1 Using the formula:

16

X f CF

0 < 20 15 15

20 < 40 60 75

40 <100 25 100

N/4 = 25th item

This is in the group 20 < 40Lower limit (l) is 20

Width of group (i) is 20Frequency of group (f) is 60

CF of previous group (F) is 15

Formula is:

f

FNilQ q

411

First Quartile

60

152520201Q 60

102020 333.320

= 23.3333

This means that 25% of the data is below 23.333 or 75% of the data is above 23.333

Width of group (i) is 20


Q3

17

Third QuartileThis is in the group 20 < 40Lower limit (l) is 20

Width of group (i) is 20Frequency of group (f) is 60


X f CF

0 < 20 15 15

20 < 40 60 75

40 <100 25 100

3N/4 = 75th item

Formula is:

f

FNilQ q

4333

60

157520203Q 60

602020 2020

= 40

So 25% of the data is above this point( i. e 40).

Interquartile Range and Coefficient of Q. D.

Interquartile range = 40-23.333= 16.671

Semi-interquartile range or quartile deviation is defined as

= (Q3 – Q1 )/2 = 16.67/2 =8.335

Coefficient of quartile deviation is = = (Q3 – Q1 )/(Q3 + Q1) = 16.67/ 63.33 =

0.26

ExampleWeekly income (Rs.) no. of workers

below 1350 8

1350-1370 16

1370-1390 39

1390-1410 58

1410-1430 60

1430-1450 40

1450-1470 22

1470-1490 15

1490-1510 15

1510-1530 9

1530 and above 10

Use an appropriate measure to evaluate the variation in the following data:

Problems with quartile Deviation

■ It is not based on all the observations■ Affected by sampling fluctuations■ Not suitable for further algebraic treatment

Deviation Measures of Dispersion (cont.) The deviation from the mean for a representative case i is

xi - mean of x. If almost all of these deviations are small, dispersion is small. If many of these deviations are large, dispersion is large.

This suggests we could construct a measure D of dispersion that would simply be the average (mean) of all the deviations.

But this will not work because, as we saw earlier, it is a property of the mean that all deviation from it add up to zero.

Deviation Measures of Dispersion: Example (cont.)

The Mean Deviation A practical way around this problem is simply to ignore the

fact that some deviations are negative while others are positive by averaging the absolute values of the deviations.

This measure (called the mean deviation) tells us the average (mean) amount that the values for all cases deviate (regardless of whether they are higher or lower) from the average (mean) value.

Indeed, the Mean Deviation is an intuitive, understand-able, and perfectly reasonable measure of dispersion, and it is occasionally used in research.

The mean deviation takes into consideration all of the values.

The Mean Deviation (cont.)

If the data are in the form of a frequency distribution, the mean deviation can be calculated using the following formula:

Where: f = the frequency of an observation xn = f = the sum of the frequencies

This measure is an improvement over the previous two measures in the sense that it considers all observations of a data set.

Frequency Distribution Mean Deviation

f

xxfMD

_

||

Coefficient of mean deviation Coefficient of mean deviation =

= Mean deviation

Mean

ExampleFind out the mean deviation for the following distribution of

demand for a book

Quantity demanded

(in unit)Frequency fx |x-x| f|x-x|

6 4 24 17.6 70.4

12 7 84 11.6 81.2

18 10 180 5.6 56

24 18 432 0.4 7.2

30 12 360 6.4 76.8

36 7 254 12.4 86.8

42 2 84 18.4 36.8

total 60 fx = 1416 f|x-x| =415.2

mean = 1416/60=23.6

MD = 415.2/60=6.92

f

xxfMD

_

||

f

fxx_

x

Problems with Mean Deviation■ Algebraic signs are ignored while taking

the deviations of the items.■ Cannot be computed for distribution

with open end classes.■ Not suitable for further mathematical

treatment.

Standard Deviation Standard deviation is the most commonly

used measure of dispersion Similar to the mean deviation, the

standard deviation takes into account the value of every observation

It is the measure of the degree of dispersion of the data from the mean value.

First, it says to subtract the mean from each of the scores This difference is called a deviate or a

deviation score The deviate tells us how far a given score is

from the typical, or average, score Thus, the deviate is a measure of dispersion for

a given score

It is a static that tells us how tightly all the various values are clustered around the mean in set of data.

Large S.D. indicates that data points are far from the mean

Small S.D. indicates that all the data points cluster closely around the mean.

Standard Deviation■ It is the positive square root of the arithmetic

mean of the squares of the deviations of the observations from their arithmetic mean.

■ Calculation: Calculate the arithmetic mean (AM) Subtract each individual value from the AM Square each value -- multiply it times itself Sum (total) the squared values Divide the total by the number of values (N) Calculate the square root of the value

■ Formula:

n

xx

2_

σ

The Mean, Deviations, Variance, and SD

What is the effect of adding a constant amount to (or subtracting from) each observed value?

What is the effect of multiplying each observed value (or dividing it by) a constant amount?

) Adding (subtracting) the same amount to (from) every observed value changes the mean by the same amount but does not change the dispersion (for either range or deviation measures

Multiplying every observed value by the same factor changes the mean and the SD [or MD] by that same factor and changes the variance by that factor squared.

usefulness Manufacturers interested in producing items of

consistent quality are very much concerned with S.D.

If the mean life of the component is 4 years and the S.D. is very large, it would correspond to many failures large before 4 years.

Quality control requires consistency and consistency requires a relatively small S.D.

The square of the standard deviation.More useful when we begin analysis rather than description:

1

)( 22

n

xxs

Variance

What Does the Variance Formula Mean? Variance is the mean of the squared

deviation scores The larger the variance is, the more the

scores deviate, on average, away from the mean

The smaller the variance is, the less the scores deviate, on average, from the mean

Combined Variance (For different means)

21

22

222

21

211 )()(

nn

dndn

Exercise 3■ The mean and s.d of the ‘lives’ of tyres of

manufactured by two factories of ‘Durable’ tyre company, making 50,000 tyres annually , at each of the two factories , are given below. Calculate combined mean and standard deviation of the ‘life’ of all the 100000 tyres produced in a year.

Factory Sample Size Mean (‘000 Kms) SD (‘000 Kms)

1 50 60 8 2 50 50 7

Combined Variance (For same means)

21

222

211

nn

nn

Example The following data is

related to clients obtained by insurance agents during a given period for two types of insurance policies, a child policy and a retirement policy. Calculate the combined S.D.

Child policy

Retirement policy

No. of agents

25 18

Average no. of clients booked

72 64

Variance of the distribution

8 6

The Coefficient of Variation It is the most important relative measures of

dispersion One ratio measure of dispersion/inequality is the coefficient

of variation, which is simply the standard deviation divided by the mean. It answers the question: how big is the SD relative to the

mean?

100 variationoft coefficien

x

s

It is therefore a useful statistic to compare the degree of variation from one data series to another.

It helps us to determine how much volatility (risk) we are assuming in comparison to the amount of return one can expect from an investment

Lower the coefficient of variation, better the risk-return tradeoff.

The distribution for which C.V. is more is said to be less stable, less uniform, less consistent, less homogeneous.

Measure of Skew Skew is a measure of symmetry in the

distribution of scores

Positive Skew

Negative Skew

Normal (skew = 0)

Measure of Skewness Measure of skewness of a distribution is

given by =3(mean – median) S.D.

This measure is known as Karl Pearson’s coefficient of skewness and lies b/w -3 and +3.

A distribution is said to be symmetric if mean = median = mode

A distribution is said to be positively skewed if mean > median > mode

A distribution is said to be negatively skewed if mean < median < mode

The smaller the number- the less the skewness.If co.skew=0 then the data is exactly balanced.

Bell -Shaped Curve showing the relationship between and .

68%

95%

99.7%

Documents

Actual Dispersion