Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
(A) MEASURE OF CENTRAL TENDENCY :
โข A measure of central tendency is a summary statistic that represents thecenter point or typical value of a dataset. You can think of it as thetendency of data to cluster around a middle value.
โข In statistics, the three most common measures of central tendency arethe mean, median, and mode. Each of these measures calculates thelocation of the central point using a different method.
(1) Mean ( เดฅ๐ฟ ):
โข The mean is the arithmetic average, and it is probably the measure of central tendency that
you are most familiar. Calculating the mean is very simple, you just add up all of the values
and divide by the number of observations in your dataset. For the data set ๐ฅ1, ๐ฅ2, โฆ , ๐ฅ๐ the
mean is given as follow
เดฅ๐ฟ =๐๐+๐๐+โฏ+๐๐
๐=
ฯ๐=๐๐ ๐๐
๐,
Example :- Statistics grade : 20, 30, 35, 45
เดฅ๐ฟ =๐๐+๐๐+๐๐+๐๐
๐= ๐๐ ,
Statistics grade : 1, 20, 30, 35, 45
เดฅ๐ฟ =๐+๐๐+๐๐+๐๐+๐๐
๐= ๐๐. ๐.
Statistics grade : 20, 30, 35, 45, 100
เดฅ๐ฟ =๐๐+๐๐+๐๐+๐๐+๐๐๐
๐= ๐๐,
โข The calculation of themean incorporates allvalues in the data.
โข If you change any value,the mean changes. Sothe mean is sensitive toextreme values.
(2) Median :
โข The median is the middle value. It is the value that splits the dataset in half. To find
the median :
โข Order your data from smallest to largest.
โข The method for locating the median varies slightly depending on whether your
dataset has an even or odd number of values.
Median =
๐ ๐+๐
๐
, ๐ข๐ ๐ญ๐ก๐ ๐ฌ๐๐ฆ๐ฉ๐ฅ๐ ๐ฌ๐ข๐ณ๐ ๐ข๐ฌ ๐๐ง ๐จ๐๐ ๐ง๐ฎ๐ฆ๐๐๐ซ
๐ ๐๐+๐ ๐
๐+๐
๐, ๐ข๐ ๐ญ๐ก๐ ๐ฌ๐๐ฆ๐ฉ๐ฅ๐ ๐ฌ๐ข๐ณ๐ ๐ข๐ฌ ๐๐ง ๐๐ฏ๐๐ง ๐ง๐ฎ๐ฆ๐๐๐ซ
Smallest Largest
10, 20, 30
๐ฅ2 = 20
10, 20, 30,40๐ฅ2 + ๐ฅ3
2
=20 + 30
2= 25
Example :- Statistics grade : 30, 20, 60, 45, 35
โข 20 , 30 , 35 , 45 , 60
Sample size is an odd number โ ๐ = ๐ โ Median = ๐ ๐+๐
๐
= ๐ ๐+๐
๐
= ๐ ๐ = ๐๐.
Statistics grade : 30, 20, 60, 45, 35, 10
โข 10, 20 , 30 , 35 , 45 , 60
Sample size is an even number โ ๐ = ๐ โ Median =๐ ๐
๐+๐ ๐
๐+๐
๐=
๐ ๐๐
+๐ ๐๐+๐
๐=
๐ ๐ +๐ ๐
๐
=๐๐+๐๐
๐= ๐๐. ๐.
โข The median value doesnโt depend on all the values in the dataset. Consequently,
when some of the values are more extreme, the effect on the median is smaller.
(3) Mode :
โข The mode is the value that occurs the most frequently in your data set.
Example:- Statistics grade : 30, 30, 20, 60, 60, 60, 45, 35
Mode = 60.
Statistics grade : 30, 30, 30, 20, 60, 60, 60, 45, 35
Mode = 30 , 60.
Statistics grade : 30, 20, 60, 45, 35
No mode
โข If the data have multiple values that are tied for occurring the most frequently, you have
a multimodal distribution.
โข If no value repeats, the data do not have a mode.
Mode = 0
Note that:โข In a symmetric distribution, the mean locates the center accurately. However, in a skewed
distribution, the mean can miss the mark. In the histogram above, it is starting to fall outside
the central area. This problem occurs because outliers have a substantial impact on the mean.
Extreme values in an extended tail pull the mean away from the center. As the distribution
becomes more skewed, the mean is drawn further away from the center.
โข When to use the mean: Symmetric distribution.
โข When you have a symmetrical distribution for continuous data, the mean, median, and
mode are equal. In this case, analysts tend to use the mean because it includes all of the
data in the calculations.
โข When you have a skewed distribution, the median is a better measure of central
tendency than the mean.
โข For categorical data, you must use the mode.
SHEET (2)
9. The lengths of time, in minutes, that 10 patients waited in a doctor's office beforereceiving treatment were recorded as follows: 5, 11, 9, 5, 10, 15, 6, 10, 5, and 10.Treating the data as a random sample, find (a) the mean; (b) the median; (c) the mode.
Answer
(a) Mean =๐+๐๐+๐+โฏ+๐๐
๐๐= ๐. ๐
(b) 5 5 5 6 9 10 10 10 11 15
๐ = ๐๐ โ even โ
Median =๐๐+๐๐
๐=
๐+๐๐
๐= ๐. ๐
(c) Mode = 5 , 10.
Sheet (2): 8, 10.
(B) MEASURE OF DISPERSION :
โข The measures of central tendency are not adequate to describe data.Two data sets can have the same mean, but they can be entirelydifferent. Thus to describe data, one needs to know the extent ofvariability. This is given by the measures of dispersion.
โข In statistics, dispersion (also called variability, scatter, or spread) isthe extent to which a distribution is stretched or squeezed.
โข The three most common measures of dispersion arethe Range, interquartile range, and (variance , standard deviation).
(1) Range ( R ):
โข The range is the difference between the largest and the smallest observation in
the data.
Range โ R โ = largest value โ smallest value
โข Advantage : Is that it is easy to calculate.
โข Disadvantages : It is very sensitive to outliers and does not use all the
observations in a data set.
Example :- Statistics grade ( A ) : 20, 30, 35, 45
Statistics grade ( B ) : 1, 30, 35, 45
Statistics grade ( C ) : 20, 30, 35, 100
Who had the greater spread of marks?
๐ has a great spread of marks, because ๐ ๐ถ > ๐ ๐ต > ๐ ๐ด
๐ ๐ด = 45 โ 20 = 25
๐ ๐ต = 45 โ 1 = 44
๐ ๐ถ = 100 โ 20 = 80
(2) Interquartile Range ( IQR ):
โข The interquartile range describes the middle 50% of observations.
โข If the interquartile range is large it means that the middle 50% of observations are
spaced wide apart.
โข Advantage : Is that it is not affected by extreme values.
โข Disadvantages : It does not use all the observations in a data set.
โ To get the value of the interquartile range, follow these steps:
โข put the data in order โ from smallest to largestโ
โข Find the median of all data : ๐ธ๐
โข Find the median of the data before ๐ธ๐: ๐ธ๐
โข Find the median of the data after ๐ธ๐: ๐ธ๐
โข IQR = ๐ธ๐- ๐ธ๐
Smallest Largest๐ธ๐
๐๐% ๐๐%
๐ธ๐
25% 25%
๐ธ๐
25% 25%
IQR
the middle ๐๐%
โข Example (1) :- Statistics grade : 60, 35, 25, 45, 30, 37, 45, 10
(1) 10 25 30 35 37 45 45 60
(2) n = 8 โ even ๐ธ๐ =๐๐+๐๐
๐=
๐๐+๐๐
๐= ๐๐
(3) 10 25 30 35
๐ธ๐ =๐๐+๐๐
๐= ๐๐. ๐
(4) 37 45 45 60
๐ธ๐ =๐๐+๐๐
๐= ๐๐
(5) IQR = ๐ธ๐ - ๐ธ๐ = 45 - 27.5 = 17.5
โข Example (2):- Statistics grade : 60, 35, 45, 30, 37, 45, 10
(1) 10 30 35 37 45 45 60
(2) n = 7 โ odd, ๐ธ๐ = ๐ ๐+๐
๐
= ๐ ๐
๐
= ๐ ๐ = ๐๐
(3) 10 30 35
๐ธ๐ = ๐๐
(4) 45 45 60
๐ธ๐ = ๐๐
(5) IQR = ๐ธ๐ โ๐ธ๐= 45 - 30 = 15
(3) Variance ( ๐บ๐ ) and Standard deviation ( S ):
Variance ( ๐บ๐ ):
โข It measures how far a set of numbers is spread out from their average value. The
Variance is the average of the squared differences from the Mean.
โข Let ๐๐, ๐๐, โฆ , ๐๐ be a sample of size ๐. To calculate the variance follow these steps:
โช Work out the Mean ( the simple average of the numbers ) ๐ฟ
โช Then for each number: subtract the Mean and square the result (the squared
difference)
โช Then work out the average of those squared differences.
๐๐ =ฯ๐=๐๐ ๐๐ โ ๐ฟ
๐
๐ โ ๐
๐๐ โ ๐ฟ๐+ ๐๐ โ ๐ฟ
๐+โฏ+ ๐๐ โ ๐ฟ
๐= ฯ๐=๐
๐ ๐๐ โ ๐ฟ๐.
the data is a Sample (aselection taken from a biggerPopulation).
Standard deviation ( ๐บ๐ ):
โข The Standard Deviation is a measure of how spread out numbers. It is a measure of how
far typical value tends to be from the mean.
โข It is the square root of the Variance ๐บ = ๐บ๐.
Example : 9, 2, 5, 4, 12, 7.
Find the Standard Deviation?
โ ๐ = ๐.
โ ๐ฟ =ฯ๐=๐๐ ๐๐
๐=
๐+๐+โฏ+๐
๐= ๐. ๐
โ ฯ๐=๐๐ ๐๐ โ ๐ฟ
๐= ๐ โ ๐. ๐ ๐ + ๐ โ ๐. ๐ ๐ +โฏ+ ๐ โ ๐. ๐ ๐ = ๐๐. ๐
โ ๐บ๐ =ฯ๐=๐๐ ๐๐โ๐ฟ
๐
๐โ๐=
๐๐.๐
๐= ๐๐. ๐
โ ๐บ = ๐บ๐ = ๐๐. ๐ = ๐. ๐๐๐
SHEET (2)
11. The following measurements were recorded for the drying time, in hours, of a certain
brand of latex paint.
Answer
(a) Sample Mean ๐ฟ =ฯ๐=๐๐๐ ๐๐
๐๐=
๐.๐+๐.๐+โฏ+๐.๐
๐๐= ๐. ๐๐๐
(b) 2.5 2.8 2.8 2.9 3.0 3.3 3.4 3.6 3.7 4.0 4.4 4.8 4.8 5.2 5.6n : odd Median = ๐ ๐+๐
๐
= ๐ ๐๐
๐
= ๐ ๐ = ๐. ๐
(c) ฯ๐=๐๐๐ ๐๐ โ ๐ฟ
๐= ๐. ๐ โ ๐. ๐๐๐ ๐ +โฏ+ ๐. ๐ โ ๐. ๐๐๐ ๐ = ๐๐. ๐๐๐๐๐
๐บ๐ =ฯ๐=๐๐๐ ๐๐โ๐ฟ
๐
๐๐โ๐=
๐๐.๐๐๐๐๐
๐๐= ๐. ๐๐๐ ๐บ = ๐บ๐ = ๐. ๐๐๐ = ๐. ๐๐๐
3.4 2.5 4.8 2.9 3.6
2.8 3.3 5.6 3.7 2.8
4.4 4.0 5.2 3.0 4.8
(a)Calculate the sample mean for these data.
(b) Calculate the sample median.
(c) Compute the sample variance and
sample standard deviation.
๐ = 5 ร 3 = 15
BOX PLOT
โข A boxplot, also called a box and whisker plot, is a way to show thespread and skewness of a data set.
โข Steps:
โขput the data in order โ from smallest to largestโ
โข Find the median of all data : ๐ธ๐ โ Medianโ
โข Find the median of the data before ๐ธ๐: ๐ธ๐ โ Lower Quartile โ
โข Find the median of the data after ๐ธ๐: ๐ธ๐ โ upper Quartile โ
๐ธ๐ ๐ธ๐ ๐ธ๐Smallest Largest
โข The lines extending parallel from the boxes are known as the โwhiskersโ,which are used to indicate variability outside the upper and lowerquartiles.
โข Skewness:๐ธ๐ ๐ธ๐ ๐ธ๐
๐ธ๐ ๐ธ๐ ๐ธ๐ ๐ธ๐ ๐ธ๐ ๐ธ๐
๐ธ๐ โ ๐ธ๐ > ๐ธ๐ โ๐ธ๐ ๐ธ๐ โ ๐ธ๐ < ๐ธ๐ โ๐ธ๐ ๐ธ๐ โ ๐ธ๐ โ ๐ธ๐ โ๐ธ๐
+ve, right skew -ve, left skew Symmetric , Normal
Outliers:
โข Outliers are data points that are far from other data points. In other words,
theyโre unusual values in a dataset.
โข Calculating the Outlier Fences Using the Interquartile Range :
โข any value > ๐ธ๐ + ๐. ๐ โ ๐ฐ๐ธ๐น
โข any value < ๐ธ๐ โ ๐. ๐ โ ๐ฐ๐ธ๐น
๐ธ๐ ๐ธ๐ ๐ธ๐
Smallest Largest
IQR 1.5*IQR1.5*IQR ๐ธ๐ +1.5*IQR๐ธ๐ โ1.5*IQR
SHEET (2)
2. Draw a box plot for the following set of data.
4.7, 3.8, 3.9, 3.9, 4.6, 4.5, 5
Answer
โข Organize the data from the smallest to largest:
3.8 3.9 3.9 4.5 4.6 4.7 5
โข Smallest = ๐. ๐ , Largest = ๐
โข ๐ธ๐ = ๐. ๐
โข ๐ธ๐ = ๐. ๐
โข ๐ธ๐ = ๐. ๐
SHEET (2)
6. For the following list of numbers, draw a box and whisker plot showing outliersvalues:
-6, 4.5, 5.5, 6.5, 8.5, 10, 12, 30
Answer
โข Smallest = โ๐ , Largest = ๐๐
-6 4.5 5.5 6.5 8.5 10 12 30
โข ๐ธ๐ =๐.๐+๐.๐
๐= ๐. ๐
โข ๐ธ๐ =๐.๐+๐.๐
๐= ๐
โข ๐ธ๐ =๐๐+๐๐
๐= ๐๐
๐. ๐
โข Outliers:
โข ๐ฐ๐ธ๐น = ๐ธ๐ โ๐ธ๐ = ๐๐ โ ๐ = ๐
โข๐. ๐ โ ๐ฐ๐ธ๐น = ๐. ๐ โ ๐ = ๐
โขany value > ๐ธ๐ + ๐. ๐ โ ๐ฐ๐ธ๐น
> ๐๐+ ๐
> ๐๐ 30
โขany value < ๐ธ๐ โ ๐. ๐ โ ๐ฐ๐ธ๐น
< ๐ โ ๐
< โ๐ - 6
Outliers = 30, -6
Sheet (2): 1, 3, 5, 7.
SHEET (2)
4. The box and whisker plot below was drawn using a list of numbers (data).Determine if each statement is definitely true, definitely false, or cannot bedetermined.
(a) Half the data falls between 1 and 3. (definitely true)
(b) The number 5 must be in the list of numbers from which this plot was drawn. (definitely true)
(c) The number 1.5 must be in the list of numbers from which this plot was drawn. (cannot be determined)
๐ธ๐ = ๐๐ธ๐ = ๐. ๐๐ธ๐ = ๐Largest = 5
SHEET (2)
Which must be in the list of numbers from which this box and whisker plot wasdrawn?
(a) 10
(b) 40
(c) 50
(d) 55
(e) 65
๐ธ๐ = ๐๐๐ธ๐ = ๐๐Smallest = 10
Largest = 55
SHEET (2)
19. A class has eight assessment tasks over a year. The parallel box plots show thesummary of the marks for the assessments for two students, Jamie and Maryanne.
Jamie๐ธ๐ = ๐๐๐ธ๐ = ๐๐๐ธ๐ = ๐๐Smallest = 35
Largest = 90
Maryanne๐ธ๐ = ๐๐๐ธ๐ = ๐๐๐ธ๐ = ๐๐Smallest = 45
Largest = 75
(i) Who scored the highest mark? (Jamie)
(ii) Who scored the lowest mark? (Jamie)
(iii) What was the range of marks for each student?
๐นJamie = ๐๐ โ ๐๐ = ๐๐
๐น๐๐๐๐ฆ๐๐๐๐ = ๐๐ โ ๐๐ = ๐๐
(iv) Who had the greater spread of marks?
๐นJamie > ๐น๐๐๐๐ฆ๐๐๐๐(Jamie)
(v) What was the interquartile range for each student?
๐ฐ๐ธ๐นJamie = ๐๐ โ ๐๐ = ๐๐
๐ฐ๐ธ๐น๐๐๐๐ฆ๐๐๐๐ = ๐๐ โ ๐๐ = ๐๐