Upload
kirin19
View
1.888
Download
2
Embed Size (px)
Citation preview
TOPIC 7 : STATISTICS (FORM 4)
Presented by
Nur Shakirin Sabri
Nur Ayuni Daud
Nornaimah Rodzi
Statistics
Measures of central
tendency
Mean
Median
Mode
Measures of dispersion
Range
Quartiles
Variance and Stanard
deviation
Ungrouped and Grouped Data
Ungrouped data is where the values are handled individually.
e.g. 1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17
Grouped data is where the values are grouped into classes because sometimes we may collect a large number of data with varying values. e.g.
Circumference (cm)
1 - 2 3 - 4 5 - 6 7 - 8
No. of branches 2 10 14 8
Mean
Mean of an ungrouped set of data is when we add up all the values in a set of data and the sum is then divided by the number of the values.
Let x be any value in the set of dataN be the number of valuesx be the mean of the set of data.
Example: Find the mean of the following data.1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17 = 1 + 3 + 6 + 6 + 6 + 6 + 7 + 7 + 12 + 12 + 17 =
7.54611
Mean of grouped data.For grouped data, we take the midpoint of the
class, known as the class mark, to represent the class.
Let f be the frequency of for each class x be the corresponding class mark
Example:Circumference (cm)
1 - 2 3 - 4 5 - 6 7 - 8
No. of branches, f 2 10 14 8
Midpoint, x 1+2 =1.5
2
3+4=3.5 2
5+6=5.5 2
7+8=7.5 2
=(1.5)2 + (3.5)10 + (5.5)14 + (7.5)8 = 5.147 2 + 10 + 14 + 8
MedianWhen the values of a set of data is arranged either
ascending or descending, the values that lies in the middle is called median.
If n (number of values) is an odd number median = (n +1)
2If n is an even number, the median is the mean of n th value and n + 1 th value. 2Example: 1, 3, 6, 6, 6, 7, 8, 8, 12, 12, 17median is (n +1) th= ( 11 + 1) th = 6 th = 7
2 2
Mode
Mode is the value that occurs the most frequently in a set of data.
Example : 1, 3, 6, 6, 6, 7, 8, 8, 12, 12, 17Mode = 6
Modal Class of Grouped Data The class from grouped data that have
the highest frequency is known as the modal class. Example:
Q: Determine the modal class.A: The class having the highest frequency
is 30-39 cm. Hence, the modal class is 30-39 cm.
Circumference(cm)
10-19 20-29 30-39 40-49
No. of trees 14 19 23 10
Finding the mode from histogram
If a distribution of a set grouped data is represented by a histogram, we can estimate the mode of distribution from the histogram.
Mode
Steps to find mode from histogram.
1. Determine the modal class in the histogram.
2. Join the top vertices of the modal class to the vertices of the adjacent.
3. Determine the value of the horizontal axis at the intersection of the two lines. This value obtained represents the mode.
Modal class
Mode
Median from cumulative frequency distribution table
The median of grouped set of data can be calculated from cumulative frequency distribution table using the following formula:
Median:
Where L = lower boundaries where the median lies N= total frequencyF= cumulative frequency before the class in which
the median liesC= class intervalfm= frequency of the class where the median lies.
Calculate the median.The median lies between 17th and 18th branches,
which is within the class 5-6.The cumulative frequency before the class 5-6,
F=12Lower boundary, L=4.5Class interval, C= 2. Frequency of the class fm=
14
Circumference (cm)
1 - 2 3 - 4 5- 6 7 - 8
No. of branches, f 2 10 14 8
Cumulative frequency, F
2 12 26 34
Estimating the median of grouped data from an ogive.
We can use ogive to estimate the median of grouped data.
Median
Effects of uniform changes in data on mean, median and mode
Activity: To find the effects on the measures of central tendency if every value of the data is changed uniformly
Given 4, 12, 5, 7, 9, 6, 10, 6, 13a) Find the mean, median, and modeb) Add 3 to every value. Find the mean, median,
and mode and compare with the answer in (a)c) Multiple 2 to every value. Find the mean,
median, and mode and compare with the answer in (a)
d) Discuss the findings.
Effects of uniform changes in data on mean, median and mode
From the activity, we find that if u is the original measures of central tendency, and v is the new value after a uniform change in every value of the data, then we have
v = u + k if k is added to every value of the data.
v = cu if every value of the data is multiplied by c.
Extreme values in the data
30, 28, 120, 43, 35, 9
The numbers 9 and 120 are considered extreme values in the set of data.
Affect significantly to mean of the data But little or no effect on median and
mode.
Effect of adding or removing a value from a set of data.
Uncertain Generally, the mean or median will be
shifted to a higher value when some values greater than original mean or median are added or some values smaller than the original mean or median are removed.
But, the mean, mode or median will remain unchanged if the value added or removed is equal to the corresponding measure of central tendency.
Determining the most suitable measure of central tendency.
Quenstion: Determine the most suitable measure of central tendency for these set of data.
1. 42, 30, 39, 40, 35, 30 Mean = 36, Median = 37, Mode = 30
2. 42, 30, 39, 40, 35, 30, 120 Mean = 48, Median = 39, Mode = 30
Answer:3. Mean and median4. Median
Determining the most suitable measure of central tendency.
Mean is widely used because all the values in a set of data are taken into account.
But, if a set of data contains extreme values, the mean could deviate from its central tendency and may not represent the general characteristics of the data.
Under such circumstances, median will be a better representative measure of central tendency because median eliminates the effects of extreme values in the set of data.
Mode is usually used to represent a set of data containing a large number of values which take only some specific values and many repeated values.
Range
Range of ungrouped data = largest value – smallest value
Example:Find the range of data.2, 4, 6, 7, 10, 15, 16, 19, 20, 21, 24
Answer: Range = 24 – 2 = 22
Interquartile range of ungrouped data
Quartiles are three values in a set of data which divide the data into four quarters with each quarter having the same number of values.
Example: 2, 4, 6, 7, 10, 15, 16, 19, 20, 21, 24
Interquartile range = Q3 – Q1 = 20 – 6 = 14
2 numbers
2 numbers
2 numbers
2 numbersQ1 = lower
quartileQ2 = median Q3 = upper
quartile
Range of grouped data
Range of grouped data= largest class mark – smallest
class mark
Largest class mark = = 27 goals
Smallest class mark = = 2 goals
Hence, the range = 27- 2 = 25 goals
Number of goals
0 – 4 5 – 9 10 – 14 15 – 19 20 – 24 25 - 29
Number of players
12 9 4 3 1 1
Interquartile range of grouped data from cumulative frequency table.
Construct the cumulative frequency table first.
Q1
Q3
Thus, Q1 lies within the class 40-49
and Q3 lies within the class 60-69.
Number of durians
20-29 30-39 40-49 50-59 60-69 70-79
Frequency
2 5 11 16 10 4
Cumulative frequency
2 7 18 34 44 48
Since the cumulative frequency before the class 40-49 is 7, so Q1is the fifth value within the class 40-49.
The size of the class 40-49 = 49.5-39.5 = 10Assuming that the 11 values are distributed evenly
within the class.Therefore, the size between the two values =
Hence, Q1
Number of durians
20-29 30-39 40-49 50-59 60-69 70-79
Frequency
2 5 11 16 10 4
Cumulative frequency
2 7 18 34 44 48
Since the cumulative frequency before the class 60-69 is 34, so Q3 is the second value within the class 60-69.
The size of the class 60-69 = 59.5-69.5 = 10Hence, Q3
Therefore, the interquartile range= Q3-Q1= 61.32-44.05=17.27
durians
Number of durians
20-29 30-39 40-49 50-59 60-69 70-79
Frequency
2 5 11 16 10 4
Cumulative frequency
2 7 18 34 44 48
Determining the interquartile range of grouped data from an ogive.
We are given an ogive.
Q1 == 10th value= 24.5 mm
Q3 == 30th value= 31.5 mm
Hence, the interquartile range = 31.5-24.5
= 7.0 mm
24.531.5
Variance
Variance of ungrouped data or where
Variance of grouped data where f = frequency of
each class x = class mark
Standard deviation
Standard deviation is a statistical measurement which measure how much the values in a set of data are scattered around the mean. It is defined as the positive square root of the variance.
Standard deviation is a good measure of dispersion because it has the same unit as the values of the data whereas variance has a unit which is the square of the unit of the values of the data.
Example:
Time taken
Class mark,x
f fx x2 fx2
5-9 7 8 56 49 392
10-14 12 15 180 144 2160
15-19 17 4 68 289 1156
20-24 22 3 66 484 1452
Mean; Varianc
e; s.d.;
Effects on measures of dispersion when:
If every value of the data is changed uniformly, i.e. when every value n a set of data is multiplied by a constant quantity k, then we have
new range = k x original range
new interquartile range = k x ori. interquartile range
new s.d. = k x ori. standard deviation
new variance = k2 x original variance
Effects on measures of dispersion when:
If there are extreme values in the set of data, this will significantly increase the range of the set of data but have little or no effect on the interquartile range.
Extreme values also significantly increase the value of standard deviation and variance but s.d. Is affected to a smaller degree as compared to variance.
Hence, interquartile range will eliminates the effect of extreme values.
Other measures of dispersion are affected at different degree by extreme values.
Effects on measures of dispersion when:
If certain values are added or removed from a set of data, the effect on the measures of dispersion is uncertain.
In general, the range and the interquartile range are less affected as compared to the variance and the standard deviation.
Variance and standard deviation are more significantly affected when the added or removed value has a greater difference from the mean
Comparing the measures of central tendency and dispersion
There are many situations where we need to compare two or more sets of data and subsequently make a conclusion.
The measure of central tendency may not provide us with enough information for comparison.
We need to determine the measures of dispersion of a set of data to provide us with a better picture of the characteristics of the set of data and eventually help us arrive at a more meaningful and acceptable conclusion.
Team Marks
P 60, 65, 85, 76, 64, 88
Q 68, 62, 76, 80, 81, 71Team P Team Q
A teacher would like to select one of these teams to represent the school in a Mathematics quiz. The teacher is more concerned about a steady performance of the team in the quiz. Which team would the teacher select?
=
= 10.74
Both teams have the same mean, meaning that they are considered equally good statistically. However, team Q has a lower s.d., implying a small difference in the performances between the members of the team. Hence, they are expected to have a more consistent performance in the quiz as compared to team P. Therefore, the teacher would select team Q.
=
= 6.73
Thank you for your patience and attention.
THE END