stats form 4

TOPIC 7 : STATISTICS (FORM 4)

Presented by

Nur Shakirin Sabri

Nur Ayuni Daud

Nornaimah Rodzi

Statistics

Measures of central

tendency

Mean

Median

Mode

Measures of dispersion

Range

Quartiles

Variance and Stanard

deviation

Ungrouped and Grouped Data

Ungrouped data is where the values are handled individually.

e.g. 1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17

Grouped data is where the values are grouped into classes because sometimes we may collect a large number of data with varying values. e.g.

Circumference (cm)

1 - 2 3 - 4 5 - 6 7 - 8

No. of branches 2 10 14 8

Mean

Mean of an ungrouped set of data is when we add up all the values in a set of data and the sum is then divided by the number of the values.

Let x be any value in the set of dataN be the number of valuesx be the mean of the set of data.

Example: Find the mean of the following data.1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17 = 1 + 3 + 6 + 6 + 6 + 6 + 7 + 7 + 12 + 12 + 17 =

7.54611

Mean of grouped data.For grouped data, we take the midpoint of the

class, known as the class mark, to represent the class.

Let f be the frequency of for each class x be the corresponding class mark

Example:Circumference (cm)

1 - 2 3 - 4 5 - 6 7 - 8

No. of branches, f 2 10 14 8

Midpoint, x 1+2 =1.5

2

3+4=3.5 2

5+6=5.5 2

7+8=7.5 2

=(1.5)2 + (3.5)10 + (5.5)14 + (7.5)8 = 5.147 2 + 10 + 14 + 8

MedianWhen the values of a set of data is arranged either

ascending or descending, the values that lies in the middle is called median.

If n (number of values) is an odd number median = (n +1)

2If n is an even number, the median is the mean of n th value and n + 1 th value. 2Example: 1, 3, 6, 6, 6, 7, 8, 8, 12, 12, 17median is (n +1) th= ( 11 + 1) th = 6 th = 7

2 2

Mode

Mode is the value that occurs the most frequently in a set of data.

Example : 1, 3, 6, 6, 6, 7, 8, 8, 12, 12, 17Mode = 6

Modal Class of Grouped Data The class from grouped data that have

the highest frequency is known as the modal class. Example:

Q: Determine the modal class.A: The class having the highest frequency

is 30-39 cm. Hence, the modal class is 30-39 cm.

Circumference(cm)

10-19 20-29 30-39 40-49

No. of trees 14 19 23 10

Finding the mode from histogram

If a distribution of a set grouped data is represented by a histogram, we can estimate the mode of distribution from the histogram.

Mode

Steps to find mode from histogram.

1. Determine the modal class in the histogram.

2. Join the top vertices of the modal class to the vertices of the adjacent.

3. Determine the value of the horizontal axis at the intersection of the two lines. This value obtained represents the mode.

Modal class

Mode

Median from cumulative frequency distribution table

The median of grouped set of data can be calculated from cumulative frequency distribution table using the following formula:

Median:

Where L = lower boundaries where the median lies N= total frequencyF= cumulative frequency before the class in which

the median liesC= class intervalfm= frequency of the class where the median lies.

Calculate the median.The median lies between 17th and 18th branches,

which is within the class 5-6.The cumulative frequency before the class 5-6,

F=12Lower boundary, L=4.5Class interval, C= 2. Frequency of the class fm=

14

Circumference (cm)

1 - 2 3 - 4 5- 6 7 - 8

No. of branches, f 2 10 14 8

Cumulative frequency, F

2 12 26 34

Estimating the median of grouped data from an ogive.

We can use ogive to estimate the median of grouped data.

Median

Effects of uniform changes in data on mean, median and mode

Activity: To find the effects on the measures of central tendency if every value of the data is changed uniformly

Given 4, 12, 5, 7, 9, 6, 10, 6, 13a) Find the mean, median, and modeb) Add 3 to every value. Find the mean, median,

and mode and compare with the answer in (a)c) Multiple 2 to every value. Find the mean,

median, and mode and compare with the answer in (a)

d) Discuss the findings.

Effects of uniform changes in data on mean, median and mode

From the activity, we find that if u is the original measures of central tendency, and v is the new value after a uniform change in every value of the data, then we have

v = u + k if k is added to every value of the data.

v = cu if every value of the data is multiplied by c.

Extreme values in the data

30, 28, 120, 43, 35, 9

The numbers 9 and 120 are considered extreme values in the set of data.

Affect significantly to mean of the data But little or no effect on median and

mode.

Effect of adding or removing a value from a set of data.

Uncertain Generally, the mean or median will be

shifted to a higher value when some values greater than original mean or median are added or some values smaller than the original mean or median are removed.

But, the mean, mode or median will remain unchanged if the value added or removed is equal to the corresponding measure of central tendency.

Determining the most suitable measure of central tendency.

Quenstion: Determine the most suitable measure of central tendency for these set of data.

1. 42, 30, 39, 40, 35, 30 Mean = 36, Median = 37, Mode = 30

2. 42, 30, 39, 40, 35, 30, 120 Mean = 48, Median = 39, Mode = 30

Answer:3. Mean and median4. Median

Determining the most suitable measure of central tendency.

Mean is widely used because all the values in a set of data are taken into account.

But, if a set of data contains extreme values, the mean could deviate from its central tendency and may not represent the general characteristics of the data.

Under such circumstances, median will be a better representative measure of central tendency because median eliminates the effects of extreme values in the set of data.

Mode is usually used to represent a set of data containing a large number of values which take only some specific values and many repeated values.

Range

Range of ungrouped data = largest value – smallest value

Example:Find the range of data.2, 4, 6, 7, 10, 15, 16, 19, 20, 21, 24

Answer: Range = 24 – 2 = 22

Interquartile range of ungrouped data

Quartiles are three values in a set of data which divide the data into four quarters with each quarter having the same number of values.

Example: 2, 4, 6, 7, 10, 15, 16, 19, 20, 21, 24

Interquartile range = Q3 – Q1 = 20 – 6 = 14

2 numbers

2 numbers

2 numbers

2 numbersQ1 = lower

quartileQ2 = median Q3 = upper

quartile

Range of grouped data

Range of grouped data= largest class mark – smallest

class mark

Largest class mark = = 27 goals

Smallest class mark = = 2 goals

Hence, the range = 27- 2 = 25 goals

Number of goals

0 – 4 5 – 9 10 – 14 15 – 19 20 – 24 25 - 29

Number of players

12 9 4 3 1 1

Interquartile range of grouped data from cumulative frequency table.

Construct the cumulative frequency table first.

Q1

Q3

Thus, Q1 lies within the class 40-49

and Q3 lies within the class 60-69.

Number of durians

20-29 30-39 40-49 50-59 60-69 70-79

Frequency

2 5 11 16 10 4

Cumulative frequency

2 7 18 34 44 48

Since the cumulative frequency before the class 40-49 is 7, so Q1is the fifth value within the class 40-49.

The size of the class 40-49 = 49.5-39.5 = 10Assuming that the 11 values are distributed evenly

within the class.Therefore, the size between the two values =

Hence, Q1

Number of durians

20-29 30-39 40-49 50-59 60-69 70-79

Frequency

2 5 11 16 10 4


2 7 18 34 44 48

Since the cumulative frequency before the class 60-69 is 34, so Q3 is the second value within the class 60-69.

The size of the class 60-69 = 59.5-69.5 = 10Hence, Q3

Therefore, the interquartile range= Q3-Q1= 61.32-44.05=17.27

durians

Number of durians

20-29 30-39 40-49 50-59 60-69 70-79

Frequency

2 5 11 16 10 4


2 7 18 34 44 48

Determining the interquartile range of grouped data from an ogive.

We are given an ogive.

Q1 == 10th value= 24.5 mm

Q3 == 30th value= 31.5 mm

Hence, the interquartile range = 31.5-24.5

= 7.0 mm

24.531.5

Variance

Variance of ungrouped data or where

Variance of grouped data where f = frequency of

each class x = class mark

Standard deviation

Standard deviation is a statistical measurement which measure how much the values in a set of data are scattered around the mean. It is defined as the positive square root of the variance.

Standard deviation is a good measure of dispersion because it has the same unit as the values of the data whereas variance has a unit which is the square of the unit of the values of the data.

Example:

Time taken

Class mark,x

f fx x2 fx2

5-9 7 8 56 49 392

10-14 12 15 180 144 2160

15-19 17 4 68 289 1156

20-24 22 3 66 484 1452

Mean; Varianc

e; s.d.;

Effects on measures of dispersion when:

If every value of the data is changed uniformly, i.e. when every value n a set of data is multiplied by a constant quantity k, then we have

new range = k x original range

new interquartile range = k x ori. interquartile range

new s.d. = k x ori. standard deviation

new variance = k2 x original variance


If there are extreme values in the set of data, this will significantly increase the range of the set of data but have little or no effect on the interquartile range.

Extreme values also significantly increase the value of standard deviation and variance but s.d. Is affected to a smaller degree as compared to variance.

Hence, interquartile range will eliminates the effect of extreme values.

Other measures of dispersion are affected at different degree by extreme values.


If certain values are added or removed from a set of data, the effect on the measures of dispersion is uncertain.

In general, the range and the interquartile range are less affected as compared to the variance and the standard deviation.

Variance and standard deviation are more significantly affected when the added or removed value has a greater difference from the mean

Comparing the measures of central tendency and dispersion

There are many situations where we need to compare two or more sets of data and subsequently make a conclusion.

The measure of central tendency may not provide us with enough information for comparison.

We need to determine the measures of dispersion of a set of data to provide us with a better picture of the characteristics of the set of data and eventually help us arrive at a more meaningful and acceptable conclusion.

Team Marks

P 60, 65, 85, 76, 64, 88

Q 68, 62, 76, 80, 81, 71Team P Team Q

A teacher would like to select one of these teams to represent the school in a Mathematics quiz. The teacher is more concerned about a steady performance of the team in the quiz. Which team would the teacher select?

=

= 10.74

Both teams have the same mean, meaning that they are considered equally good statistically. However, team Q has a lower s.d., implying a small difference in the performances between the members of the team. Hence, they are expected to have a more consistent performance in the quiz as compared to team P. Therefore, the teacher would select team Q.

=

= 6.73

Thank you for your patience and attention.

THE END

Documents

stats form 4