Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Measures of Central Tendency
2
Measures of Central Tendency (Location)
Measures of location indicate where on the number line the data are to be found. Common measures of location are:
(i) the Arithmetic Mean,(ii) the Median, and(iii) the Mode
The mean is the most widely used average in statistics. It is found by adding up all the values in the data and dividing by how many values there are.
, , ,...,1 2 3 nx x x x
...1 2 3 in xx x x xxn n
Mean
Notation: If the data values are , then the mean is
This is the mean symbol
This symbol means the total of all the x values
Q 1 : calculate the mean of 1 , 2 , 3 , 4 , 5 , 6
Answer:
Mean =...1 2 3 in xx x x xx
n n
5.3621
6654321
Mean
Q 2 : Find the mean of data 13 , 27 , 30 , 40 , 67 , 55
Answer :
Mean = ...1 2 3 in xx x x xx
n n
66.386
2326
556740302713
Mean
Q 3 : Find the mean of daily wages of 10 workers 13 , 16 , 15 , 15 , 18 , 15 , 14 , 18 , 16 , 10
Answer :
...1 2 3 in xx x x xxn n
5.110150
1010161814151815151613
Mean
Mean=
If data are presented in a frequency table:
Mean
Value Frequency
… …2x
nx
1x 1f2f
nf
...1 1 2 2 i in n
i i
x fx f x f x fxf f
then the mean is
Example: The table shows the results of a survey into household size. Find the mean size.
Mean
Household size, x Frequency, f1 20
2 283 254 195 166 6
To find the mean, we add a 3rd column to the table.
x × f
20
5675768036
TOTAL 114 343
Mean = 343 ÷ 114 = 3.01
SolutionCalculation of mean
i i i iClass x f f x0-10 5 5 2510-20 15 12 18020-30 25 8 20030-40 35 4 14040-50 45 2 90
N 31 635Hence, mean
i i1x f xN
635 20.4831
10
The Median and Mode
• If the sample data are arranged in increasing order, the median is
(i) the middle value if n is an odd number, or(ii) midway between the two middle values if n is
an even number
• The mode is the most commonly occurring value.
11
Example 1 – n is odd
The reordered systolic blood pressure data seen earlier are:
113, 124, 124, 132, 146, 151, and 170.
The Median is the middle value of the ordered data, i.e. 132.
Two individuals have systolic blood pressure = 124 mm Hg, so the Mode is 124.
12
Example 2 – n is even
Six men with high cholesterol participated in a study to investigate the effects of diet on cholesterol level. At the beginning of the study, their cholesterol levels (mg/dL) were as follows:
366, 327, 274, 292, 274 and 230.
Rearrange the data in numerical order as follows:
230, 274, 274, 292, 327 and 366.
The Median is half way between the middle two readings, i.e. (274+292) 2 = 283.
Two men have the same cholesterol level- the Mode is 274.
Median in a Frequency DistributionMedian
12.2 – Measures of Central Tendency
Example:Find the median for the distribution.
Value (x) 1 2 3 4 5
Frequency (f) 4 3 2 6 8
Position of the median is the sum of the frequencies divided by 2.
Position of the median = (f)
2=
232
= 11.5 = 12th term
The 12th term is the median and its value is 4.Add the frequencies from either side until the sum is 12.
)(2 hf
CFn
lMedian
where l is the lower limit of the median class, CF is the cumulative frequency preceding the median class, f is the frequency of the median class, and h is the median class interval.
The MedianMedian of a sample of data organized in a frequency distribution is computed by:
To determine the median class for grouped data
Construct a cumulative frequency distribution.
Divide the total number of data values by 2.
Determine which class will contain this value. For example, if n=50, 50/2 = 25, then determine which class will contain the 25th value.
Movies showing
Frequency Cumulative Frequency
1 up to 3 1 1
3 up to 5 2 3
5 up to 7 3 6
7 up to 9 1 7
9 up to 11 3 10
33.6)2(3
32
10
5)(2
hf
CFn
lMedian
From the table, l=5, n=10, f=3, h=2, CF=3
Mode in a Frequency DistributionMode
12.2 – Measures of Central Tendency
Example:Find the mode for the distribution.
Value (x) 1 2 3 4 5
Frequency (f) 4 3 2 6 8
The mode in a frequency distribution is the value that has the largest frequency.
The mode for this frequency distribution is 5 as it occurs eight times.
Example: Comparing the Mean, Median, and Mode
Find the mean, median, and mode of the sample ages of a class shown. Which measure of central tendency best describes a typical entry of this data set? Are there any outliers?
Ages in a class
20 20 20 20 20 20 21
21 21 21 22 22 22 23
23 23 23 24 24 65
Solution: Comparing the Mean, Median, and Mode
Mean:
x
xn
20 20 ... 24 6520
23.8 years
Median: 21 22 21.5 years2
20 years (the entry occurring with thegreatest frequency)
Ages in a class
20 20 20 20 20 20 21
21 21 21 22 22 22 23
23 23 23 24 24 65
Mode:
Solution: Comparing the Mean, Median, and Mode
Mean ≈ 23.8 years Median = 21.5 years Mode = 20 years
• The mean takes every entry into account, but is influenced by the outlier of 65.
• The median also takes every entry into account, and it is not affected by the outlier.
• In this case the mode exists, but it doesn't appear to represent a typical entry.
Solution: Comparing the Mean, Median, and Mode
Sometimes a graphical comparison can help you decide which measure of central tendency best represents a data set.
In this case, it appears that the median best describes the data set.
Measures of Variation
There are 3 values used to measure the amount of dispersion or variation. (The spread of the group)
1. Range2. Variance3. Standard Deviation
Why is it Important?• You want to choose the best brand
of paint for your house. You are interested in how long the paint lasts before it fades and you must repaint. The choices are narrowed down to 2 different paints. The results are shown in the chart. Which paint would you choose?
The chart indicates the number of months a paint lasts before fading.
Paint A Paint B
10 35
60 45
50 30
30 35
40 40
20 25
210 210
Does the Average Help?
• Paint A: Avg = 210/6 = 35 months
• Paint B: Avg = 210/6 = 35 months
• They both last 35 months before fading. No help in deciding which to buy.
Consider the Spread
• Paint A: Spread = 60 – 10 = 50 months
• Paint B: Spread = 45 – 25 = 20 months
• Paint B has a smaller variance which means that it performs more consistently. Choose paint B.
Range
• The range is the difference between the lowest value in the set and the highest value in the set.
• Range = High - Low
Example
• Find the range of the data set.
• 40, 30, 15, 2, 100, 37, 24, 99
• Range = 100 – 2 = 98
Mean Deviation from the Mean
Let us first understand what ‘mean deviation’ is. Mean deviation is the mean of the absolute deviations of a set of observations, taken from a definite central value (can be mean, median or anything else).
The keyword to note in the above definition is ‘absolute’ — only the numerical value of the deviation is to be taken, ignoring the sign.
Mean Deviation from the Mean
Mean deviation from the meanfor raw data (unclassified) :
In this case mean deviation from the mean for a set of n observations is given by
n
ii 1
x xM.D. (X)
n
Mean deviation from the mean for grouped data (classified) :
In this case if xi’s are the mid-points of classes with frequencyfi, then the mean deviation from the mean is given by
n
i ii 1
n
ii 1
f x x
M.D. (x)f
ExampleConsider the sample {12, 23, 17, 15, 18}.Find the mean deviation from the mean.
Solutions:
-560
-21
Data Deviation from Mean_________________________1223171518
x x x
x 15
12 23 17 15 18 17( )
Note: (Always!) xx 0)(
Mean Absolute Deviation
Mean Absolute Deviation: The mean of the absolute values of the deviations from the mean:
8.25
14)12065(
5
1||
1 xx
n
For the previous example:
xx ||1
deviationabsoluteMean n
Mean Deviation from the Median
The only difference here is that the mean is replaced by the value of the median.
Mean deviation from the medianfor raw data (unclassified)
In this case mean deviation fromthe median for a set of nobservations is given by
n
ii 1
x MedianM.D.
n
Mean deviation from the medianfor grouped data (classified)
In this case if xi’s are themid-points of classes withfrequency fi , then the meandeviation from the median isgiven by
n
i ii 1
n
ii 1
f x MedianM.D.
f
SolutionCalculation of median and mean deviation
i i i i i
Cumulativex f x – 12 f x – 12
frequency5 1 1 7 76 5 6 6 367 11 17 5 858 14 31 4 1249 16 47 3 14110 13 60 2 12011 10 70 1 7012 70 140 0 013 4 144 1 14415 1 145 3 43518 1 146 6 87620 1 147 8 1176
N 147 3214
Here, N = 147, N 73.52
The cumulative frequency
just greater than
is 140 and the value ofx is 12.
N2
Hence, median = 12.
The value of the median heresignifies that for about half thenumber of days, approximately12 students were absent.
Mean deviation about median i i1 f x 12N
3214147
= 21.86
There are three commonly used measures of spread (or dispersion) – the range, the inter-quartile range and the standard deviation.
( )2
variance ix xn
( )2
s.d. ix xn
Standard deviation
The following formulae can be used to find the variance and s.d.
variance = (standard deviation)2
The variance is related to the standard deviation:
The standard deviation is widely used in statistics to measure spread. It is based on allthe values in the data, so it is sensitive to the presence of outliers in the data.
Total: 22
Example: The mid-day temperatures (in °C) recorded for one week in June were: 21, 23, 24, 19, 19, 20, 21
( )2
variance ix xn
Standard deviation
...21 23 21 147 217 7
x
21 0 023 2 424 3 919 -2 419 -2 420 -1 1
21 0 0
( )2ix xix xix
So variance = 22 ÷ 7 = 3.143
So, s.d. = 1.77°C (3 s.f.)
°CFirst we find the mean:
Variance and Standard Deviation
Standard deviation is defined asthe positive square root of the variance.
The value of the variance and standarddeviation for a grouped data is given by
Variance,
S.D.
n 2i i
2 i 1n
ii 1
f x x
f
n 2i i
i 1n
ii 1
f x x
f
Short-cut Method to Find Out Mean and Variance
( )(x)
2In order to reduce the calculations involved in finding out the values of mean and variance for a grouped data, the following algorithm can beused to calculate the same.
Algorithm for finding out the mean for a grouped data:(x)1. Write down the frequency table with a column giving
the class-marks (mid-points of class intervals)
2. Choose a number ‘A’ (usually the middle or almostmiddle value of all xi’s) and take deviationsdi = xi– A about A.
3. Divide each deviation by the class width h.
Hence you get .
i
id
uh
4. Multiply the frequencies (fi) with thecorresponding ui .Calculate the sum (fi ui ).
5. Find the sum of all frequencies .
n
ii 1
f N
6. Use the formula
n
i ii 1
1X A h fuN
Short-cut Method to Find Out Mean and Variance(x) 2
Similarly, we can also use a short-cutmethod to calculate the variancefor a grouped data
2( )
1. Write down the frequency table with a column giving the class-marks (mid-points of class intervals)
2. Choose a number ‘A’ (usually the middle or almost middle value of all xi’s) and take deviations di = xi– A about A.
3. Multiply the frequencies (fi) with the corresponding di. Calculate the sum (fi di ).
4. Obtain the square of the deviations above (di2).
Short-cut Method to Find Out Mean and Variance ( )(x) 2
5. Multiply the frequencies (fi) with the corresponding di2.
Calculate the sum (fi di2).
6. Find the sum of all frequencies .
n
ii 1
f N
2n n2 2
i i i ii 1 i 1
1 1f d f dN N
7. Use the formula
Class Exercise - 2The following data represents theexpenditure pattern of a studentfor the month of July. The studentgets Rs. 50 everyday as a pocketmoney.
Expenditure (Rs.) Frequency No. of days0-10 510-20 1220-30 830-40 440-50 2
Calculate the mean and standard deviation.
SolutionCalculation of mean
i i i iClass x f f x0-10 5 5 2510-20 15 12 18020-30 25 8 20030-40 35 4 14040-50 45 2 90
N 31 635
Hence, mean i i1x f xN
635 20.4831
Calculation of standard deviation
2 2ii i i i i i i i
x – 25x f u f u u f u10
5 5 –2 –10 4 2015 12 –1 –12 1 1225 8 0 0 0 035 4 1 4 1 445 2 2 4 4 8
N 31 –14 44
Hence, variance
22 2 2
i i i i1 1h f u f uN N
244 1410031 31
= 121.54Hence,
= 11.02 1 2 1 .5 4
There is an alternative formula which is usually a more convenient way to find the variance:
Standard deviation
( ) ( )2 2 2But, 2i i ix x x x x x 2 22i ix x x nx 2 22ix x nx nx 2 2ix nx
22variance ix
xn
Therefore, and2
2s.d. ixx
n
( )2
variance ix xn
Example (continued): Looking again at the temperature data for June: 21, 23, 24, 19, 19, 20, 21
Standard deviation
147 217
x
...2 2 2 221 23 21ix
°C
Also, = 3109
.
.
22 23109variance 21 3 143
7s . 77.d 1
ixx
n
°C
Note: Essentially the standard deviation is a measure of how close the values are to the mean value.
We know that
So,
When the data is presented in a frequency table, the formula for finding the standard deviation needs to be adjusted slightly:
Calculating standard deviation from a table
22s.d. i i
i
f xx
f
Example: A class of 20 students were asked how many times they exercise in a normal week. Find the mean and the standard deviation.
Number of times exercise taken
Frequency
0 51 32 53 44 25 1
Calculating standard deviation from a table
x × f x2 × f
0 03 3
10 2012 368 325 25
No. of times exercise taken, x
Frequency, f
0 51 32 53 44 25 1
. .2
2 2116s.d. 1 9 1 40
82
i i
i
f xx
f
The table can be extended to help find the mean and the s.d.
TOTAL: 20 38 116
.3820
1 9x