Upload
tushar-gautam
View
47
Download
7
Tags:
Embed Size (px)
Citation preview
Measures of Dispersion
Defination
While measures of central tendency indicate what value of a variable is (in one sense or other) “average” or “central” or “typical” in a set of data, measures of dispersion (or variability or spread) indicate (in one sense or other) the extent to which the observed values are “spread out” around that center — how “far apart” observed values typically are from each other or from some average value (in particular, the mean). Thus: if all cases have identical observed values (and thereby are also
identical to [any] average value), dispersion is zero; if most cases have observed values that are quite “close
together” (and thereby are also quite “close” to the average value), dispersion is low (but greater than zero); and
if many cases have observed values that are quite “far away” from many others (or from the average value), dispersion is high.
Measures of Dispersion
Synonym for variability Often called “spread” or “scatter” Indicator of consistency among a
data set Indicates how close data are
clustered about a measure of central tendency
Example Consider the following data related to age
distribution of two groups A and B:
avg
Grp A 22 24 25 26 28 25
Grp B 8 15 20 28 54 25
Above mentioned two groups have the same average i.e. 25 years, so we are likely to conclude that the two groups are similar.
Wrong conclusion as the obs. in group A are close to one another indicating that people in this group are more or less of the age 22 to 28 years.
While those in group B are widely dissimilar and have greater variability of ages as it includes a person who is 8 years old on one hand and a person of age 54 on the other hand.
This means that central value does not give the clear indication of the pattern of distribution.
Measure of dispersion or variability gives us the information about the spread of the obs. In one distribution
Here, dispersion of group B is more than that of group A
Purpose of Measuring Variation To test the reliability of an average To serve as a basis for control of
variability To compare two or more series with
regard to variability To facilitate as a basis for further
statistical analysis.
Properties of a good measure of variation
It should be simple to understand and easy to calculate. It should be based on all observations. It should be amenable to further algebraic treatment. It should not be affected by extreme observations.
Measures of variation Absolute measures Range Quartile deviation Mean deviation Standard deviation / variance Lorenz curve Relative measures Coefficient of range Coefficient of variation Coefficient of quartile deviation Coefficient of mean deviation
Absolute measures of variation They are expressed in the same statistical unit in
which the original data are given such as rupees, kg etc.
These values are used to compare the variation in two or more than two distributions provided the variables are expressed in the same units and have almost the same average value.
Relative measures of variation Absolute measure of dispersion expresses
variation in the same units as the original data
To compare the variations of two different series, relative measure of standard deviation is calculated.
Range Range is the preliminary indicator of dispersion. The (“total” or “simple”) range is the maximum
(highest) value observed in the data [the value of the case at the 100th percentile] minus the minimum (lowest) value observed in the data [the value of the case at the 0th percentile] That is, it is the “distance” or “interval” between the
values of these two most extreme cases. Indicates how spread out the data are. Open-ended distributions have no range bz no
highest or lowest values exist in an open-ended class.
The Range The range is defined as the difference
between the largest score in the set of data and the smallest score in the set of data, XL - XS
What is the range of the following data:4 8 1 6 6 2 9 3 6 9
The largest score (XL) is 9; the smallest score (XS) is 1; the range is XL - XS = 9 - 1 = 8
Coefficient of scatter Ratio of range Coefficient of range =(Max- Min )/ (Max +
Min) = Absolute range / Sum of the extreme values
Dispersion Example
Number of minutes 20 clients waited to see a consultant
Consultant X Y 05 15 11 12 12 03 10 13 04 19 11 10 37 11 09 13 06 34 09 11
Consultant X: Sees some clients almost
immediately Others wait over 1/2 hour Highly inconsistent
Consultant Y: Clients wait about 10
minutes 9 minutes least wait and
13 minutes most Highly consistent
Solution 1.Coefficient of range =(Max- Min )/ (Max + Min) = (37- 03 )/ (37 + 03) = 34/40 = 0.852. Coefficient of range =(Max- Min )/ (Max + Min) = (13- 09 )/ (13 + 09) = 4/22 = 0.18Consultant X is inconsistent and Consultant Y is consistent in
their job.
Uses QUALITY CONTROL:
The objective of quality control is to keep a check on the quality of the product without 100% inspection
When statistical methods of quality control are used, control charts are prepared in which range plays an important role.
The basic idea is that as long as manufactured products conform to set standards (range), the production process is assumed to be in control.
WHEATHER FORECASTS: This helps the general public to know as to what limits
the temperature is likely to vary on a particular day.
Quartile deviation It measures the distance between the
lowest and highest of the middle 50 percent of the scores of distribution.
Q.D. is superior to range, as it is not based on two extreme values but rather on middle 50% observation.
It can be calculated from open-ended classes.
It is often used with skewed data as it is insensitive to the extreme scores
Interquartile Range
Interquartile range = Q3 – Q1
Semi-interquartile range or quartile deviation is defined as
= (Q3 – Q1 )/2
Coefficient of quartile deviation is = = (Q3 – Q1 )/(Q3 + Q1)
When Q.D. is small then it describes high uniformity of central 50% observations.
High Q.D. means high variation among the central observations.
Interquartile Range Example
The number of complaints received by the manager of a supermarket was recorded for each of the last 10 working days.
21, 15, 18, 5, 10, 17, 21, 19, 25 & 28 Sorted data
5, 10, 15, 17, 18, 19, 21, 21, 25 & 28
nObservatioorQ
Q
nQ
rd375.2
4
114
1
1
1
1
nObservatioorQ
Q
nQ
th825.8
4
334
13
3
3
3
Interquartile range = 21 – 15 = 6 days
Calculating exactly:Q1 Using the formula:
16
X f CF
0 < 20 15 15
20 < 40 60 75
40 <100 25 100
N/4 = 25th item
This is in the group 20 < 40Lower limit (l) is 20
Width of group (i) is 20Frequency of group (f) is 60
CF of previous group (F) is 15
Formula is:
f
FNilQ q
411
First Quartile
60
152520201Q 60
102020 333.320
= 23.3333
This means that 25% of the data is below 23.333 or 75% of the data is above 23.333
Width of group (i) is 20
CF of previous group (F) is 15
Q3
17
Third QuartileThis is in the group 20 < 40Lower limit (l) is 20
Width of group (i) is 20Frequency of group (f) is 60
CF of previous group (F) is 15
X f CF
0 < 20 15 15
20 < 40 60 75
40 <100 25 100
3N/4 = 75th item
Formula is:
f
FNilQ q
4333
60
157520203Q 60
602020 2020
= 40
So 25% of the data is above this point( i. e 40).
Interquartile Range and Coefficient of Q. D.
Interquartile range = 40-23.333= 16.671
Semi-interquartile range or quartile deviation is defined as
= (Q3 – Q1 )/2 = 16.67/2 =8.335
Coefficient of quartile deviation is = = (Q3 – Q1 )/(Q3 + Q1) = 16.67/ 63.33 =
0.26
ExampleWeekly income (Rs.) no. of workers
below 1350 8
1350-1370 16
1370-1390 39
1390-1410 58
1410-1430 60
1430-1450 40
1450-1470 22
1470-1490 15
1490-1510 15
1510-1530 9
1530 and above 10
Use an appropriate measure to evaluate the variation in the following data:
Problems with quartile Deviation
■ It is not based on all the observations■ Affected by sampling fluctuations■ Not suitable for further algebraic treatment
Deviation Measures of Dispersion (cont.) The deviation from the mean for a representative case i is
xi - mean of x. If almost all of these deviations are small, dispersion is small. If many of these deviations are large, dispersion is large.
This suggests we could construct a measure D of dispersion that would simply be the average (mean) of all the deviations.
But this will not work because, as we saw earlier, it is a property of the mean that all deviation from it add up to zero.
Deviation Measures of Dispersion: Example (cont.)
The Mean Deviation A practical way around this problem is simply to ignore the
fact that some deviations are negative while others are positive by averaging the absolute values of the deviations.
This measure (called the mean deviation) tells us the average (mean) amount that the values for all cases deviate (regardless of whether they are higher or lower) from the average (mean) value.
Indeed, the Mean Deviation is an intuitive, understand-able, and perfectly reasonable measure of dispersion, and it is occasionally used in research.
The mean deviation takes into consideration all of the values.
The Mean Deviation (cont.)
If the data are in the form of a frequency distribution, the mean deviation can be calculated using the following formula:
Where: f = the frequency of an observation xn = f = the sum of the frequencies
This measure is an improvement over the previous two measures in the sense that it considers all observations of a data set.
Frequency Distribution Mean Deviation
f
xxfMD
_
||
Coefficient of mean deviation Coefficient of mean deviation =
= Mean deviation
Mean
ExampleFind out the mean deviation for the following distribution of
demand for a book
Quantity demanded
(in unit)Frequency fx |x-x| f|x-x|
6 4 24 17.6 70.4
12 7 84 11.6 81.2
18 10 180 5.6 56
24 18 432 0.4 7.2
30 12 360 6.4 76.8
36 7 254 12.4 86.8
42 2 84 18.4 36.8
total 60 fx = 1416 f|x-x| =415.2
mean = 1416/60=23.6
MD = 415.2/60=6.92
f
xxfMD
_
||
f
fxx_
x
Problems with Mean Deviation■ Algebraic signs are ignored while taking
the deviations of the items.■ Cannot be computed for distribution
with open end classes.■ Not suitable for further mathematical
treatment.
Standard Deviation Standard deviation is the most commonly
used measure of dispersion Similar to the mean deviation, the
standard deviation takes into account the value of every observation
It is the measure of the degree of dispersion of the data from the mean value.
First, it says to subtract the mean from each of the scores This difference is called a deviate or a
deviation score The deviate tells us how far a given score is
from the typical, or average, score Thus, the deviate is a measure of dispersion for
a given score
It is a static that tells us how tightly all the various values are clustered around the mean in set of data.
Large S.D. indicates that data points are far from the mean
Small S.D. indicates that all the data points cluster closely around the mean.
Standard Deviation■ It is the positive square root of the arithmetic
mean of the squares of the deviations of the observations from their arithmetic mean.
■ Calculation: Calculate the arithmetic mean (AM) Subtract each individual value from the AM Square each value -- multiply it times itself Sum (total) the squared values Divide the total by the number of values (N) Calculate the square root of the value
■ Formula:
n
xx
2_
σ
The Mean, Deviations, Variance, and SD
What is the effect of adding a constant amount to (or subtracting from) each observed value?
What is the effect of multiplying each observed value (or dividing it by) a constant amount?
) Adding (subtracting) the same amount to (from) every observed value changes the mean by the same amount but does not change the dispersion (for either range or deviation measures
Multiplying every observed value by the same factor changes the mean and the SD [or MD] by that same factor and changes the variance by that factor squared.
usefulness Manufacturers interested in producing items of
consistent quality are very much concerned with S.D.
If the mean life of the component is 4 years and the S.D. is very large, it would correspond to many failures large before 4 years.
Quality control requires consistency and consistency requires a relatively small S.D.
The square of the standard deviation.More useful when we begin analysis rather than description:
1
)( 22
n
xxs
Variance
What Does the Variance Formula Mean? Variance is the mean of the squared
deviation scores The larger the variance is, the more the
scores deviate, on average, away from the mean
The smaller the variance is, the less the scores deviate, on average, from the mean
Combined Variance (For different means)
21
22
222
21
211 )()(
nn
dndn
Exercise 3■ The mean and s.d of the ‘lives’ of tyres of
manufactured by two factories of ‘Durable’ tyre company, making 50,000 tyres annually , at each of the two factories , are given below. Calculate combined mean and standard deviation of the ‘life’ of all the 100000 tyres produced in a year.
Factory Sample Size Mean (‘000 Kms) SD (‘000 Kms)
1 50 60 8 2 50 50 7
Combined Variance (For same means)
21
222
211
nn
nn
Example The following data is
related to clients obtained by insurance agents during a given period for two types of insurance policies, a child policy and a retirement policy. Calculate the combined S.D.
Child policy
Retirement policy
No. of agents
25 18
Average no. of clients booked
72 64
Variance of the distribution
8 6
The Coefficient of Variation It is the most important relative measures of
dispersion One ratio measure of dispersion/inequality is the coefficient
of variation, which is simply the standard deviation divided by the mean. It answers the question: how big is the SD relative to the
mean?
100 variationoft coefficien
x
s
It is therefore a useful statistic to compare the degree of variation from one data series to another.
It helps us to determine how much volatility (risk) we are assuming in comparison to the amount of return one can expect from an investment
Lower the coefficient of variation, better the risk-return tradeoff.
The distribution for which C.V. is more is said to be less stable, less uniform, less consistent, less homogeneous.
Measure of Skew Skew is a measure of symmetry in the
distribution of scores
Positive Skew
Negative Skew
Normal (skew = 0)
Measure of Skewness Measure of skewness of a distribution is
given by =3(mean – median) S.D.
This measure is known as Karl Pearson’s coefficient of skewness and lies b/w -3 and +3.
A distribution is said to be symmetric if mean = median = mode
A distribution is said to be positively skewed if mean > median > mode
A distribution is said to be negatively skewed if mean < median < mode
The smaller the number- the less the skewness.If co.skew=0 then the data is exactly balanced.
Bell -Shaped Curve showing the relationship between and .
68%
95%
99.7%