Upload
yee-bee-choo
View
131
Download
1
Tags:
Embed Size (px)
Citation preview
Data Analysis and Interpretation 1:
Basic StatisticsLecturer: Yee Bee Choo
IPGKTHO
Topic 8
Basic Statistics
Measure of Central Tendency
Mode
Median
Mean
Measure of Dispersion
Range
Variance & Standard Deviation
Standard Score
Z Score T Score
Two kinds of measures:1. Measures of central tendency 2. Measures of dispersion Both these types of measures are useful in
score reporting. They are frequently used to describe data. These are often called descriptive
statistics because they can help you describe your data.
Basic Statistics
Central tendency measures the extent to which a set of scores gathers around.
There are three major measures of central tendency:
1. Mode2. Median3. Mean
Measure of Central Tendency
Mode ◦ The “mode” for a set of data is the number (or item) that
occurs most frequently.
◦ Sometimes data can have more than one mode. This
happens when two or more numbers (or items) occur an
equal number of times in the data.
◦ A data set with two modes is called bimodal.
◦ A data set with 3 modes is called trimodal
◦ It is also possible to have a set of data with no mode.
Measure of Central Tendency
Mode Mode is the most common number To find the mode, put the numbers in order,
choose the number that appears the most frequently.
Data: 3, 5, 5, 6, 4, 3, 2, 1, 5, 6
Put in order: 1, 2, 3, 3, 4, 5, 5, 5, 6, 6
The mode is 5.
Measure of Central Tendency
Mode
BimodalData: 2, 5, 2, 3, 5, 4, 72, 2, 3, 4, 5, 5, 7Modes = 2 and 5
TrimodalData: 2, 5, 2, 7, 5, 4, 72, 2, 4, 5, 5, 7, 7Modes = 2, 5, and 7
Measure of Central Tendency
Mode
Data: 3, 5, 6, 4, 7, 8, 9, 2, 1, 0
What is the mode?0,1,2,3,4,5,6,7,8,9Is the mode = 0?
Mode = no mode
Measure of Central Tendency
Mode
The mode can be useful for dealing with categorical data. For example, if a sandwich shop sells 10 different types of sandwiches, the mode would represent the most popular sandwich.
The mode can be useful for summarising survey data or election votes.
Measure of Central Tendency
Median
A median is a measure of the "middle" value of a set of data.
To find the median, put the numbers in order and find the middle number.
If the total number of values in the sample is even, the median is calculated by finding the mean of the two values in the middle.
Data: 45, 47, 50, 51, 52, 54, 65Median = 51
Measure of Central Tendency
Median
Data: 45, 47, 50, 51, 52, 54, 65Median = 51
Data: 45, 47, 50, 51, 52, 53, 54, 65Median =(51 + 52)/2
= 51.5
Measure of Central Tendency
Mean
The ‘Mean’ is the ‘Average’ value of numerical data.
The Mean (or average) is found by adding all scores together and dividing by the number of scores.
Measure of Central Tendency
Mean
Data: 3, 5, 5, 6, 4, 3, 2, 1, 5, 6
Add up the numbers:
3 + 5 + 5 + 6 + 4 + 3 + 2 + 1 + 5 + 6 = 40
Divide by how many numbers:
40 ÷ 10 = 4
Mean = 4
Measure of Central Tendency
Exercise 1
Below is a set of marks obtained by 7 students:
82 55 73 48 88 67 67
Find the mean, mode and median.
Measure of Central Tendency
Exercise 2
On a standardised reading test, the nationwide average for Year 3 pupils is 7.0. A teacher is interested in comparing class reading scores with the national average. The scores for the 16 pupils in this class are as follows:
8, 6, 5, 10, 5, 6, 8, 9,
7, 6, 9, 5, 14, 4, 7, 6
a) Find the mean and the median reading scores for this class.
b) If the mean is used to define the class average, how does this class compare with the national norm?
c) If the median is used to define the class average, how does this class compare with the national norm?
Measure of Central Tendency
Measure of Dispersion tells about the spread of scores in a data set.
There are three major measures of dispersion:
1. Range2. Standard deviation3. Variance
Measure of Dispersion
Consider these means for weekly candy bar consumption.
X = {7, 8, 6, 7, 7, 6, 8, 7}
X = (7+8+6+7+7+6+8+7)/8
X = 7
X = {12, 2, 0, 14, 10, 9, 5, 4}
X = (12+2+0+14+10+9+5+4)/8
X = 7
What is the difference?
Measure of Dispersion
Measure of Dispersion
How well does the mean represent the scores in a distribution?
The logic here is to determine how much spread is in the scores. How much do the scores "deviate" from the mean? Think of the mean as the true score or as your best guess. If every X were very close to the Mean, the mean would be a very good predictor.
If the distribution is very sharply peaked then the mean is a good measure of central tendency and if you were to use the mean to make predictions you would be right or close much of the time.
Measure of Dispersion
Range A range represents the distance on a
numeric scale from the minimum to the maximum.
You can calculate the range by subtracting the minimum value from the maximum value.
Range = maximum - minimum If the maximum grade was 100 and the
minimum was 55, the range would be Range= 100-55
= 45.
Measure of Dispersion
Variance & Standard Deviation The variance and standard deviation describe how
far or close the numbers or observations of a data set lie from the mean (or average).
Variance is the measure of the average distance between each of a set of data points and their mean value; equal to the sum of the squares of the deviation from the mean value.
Standard deviation though calculated as the square root of the variance is the absolute value calculated to indicate the extent of deviation from the average of the data set.
Measure of Dispersion
Variance & Standard Deviation
Formulae: Variance:
2( )iX X
sN
2
2 ( )iX Xs
N
Standard Deviation:
Measure of Dispersion
Standard Deviation Standard deviation refers to how much the
scores deviate from the mean. There are two methods of calculating
standard deviation which are the deviation method and raw score method which are illustrated by the following formulae.
Measure of Dispersion
Standard Deviation (Deviation Method) To illustrate this, we will use 20, 25,30.
Using standard deviation method, we come up with the following table:
Measure of Dispersion
Standard Deviation (Raw Score Method) Using the raw score method, we can come up with
the following:
Measure of Dispersion
Standard Deviation Both methods result in the same final value of 5. If you are calculating standard deviation with a
calculator, it is suggested that the deviation method be used when there are only a few scores and the raw score method be used when there are many scores.
This is because when there are many scores, it will be tedious to calculate the square of the deviations and their sum.
Measure of Dispersion
Exercise 3
Calculate the range, variance and standard deviation for the following sample.
41, 17, 25, 34, 14, 40, 27, 19, 50, 39
26, 22, 28, 18, 42, 33, 25, 28, 27, 33
34, 7, 12, 36, 34, 16, 49, 19, 40, 28,
26, 30, 48, 33, 33, 25, 50, 29, 26, 30
Measure of Dispersion
Standard Score Standardised scores are necessary when we
want to make comparisons across tests and measurements.
Z scores and T scores are the more common forms of standardised scores.
A standardised score can be computed for every raw score in a set of scores for a test.
Measure of Dispersion
Exercise 4Consider the two sets of scores below:
A= 10, 36, 38, 40, 42, 44, 70B= 10, 12, 14, 40, 66, 68, 70
Find the range and mean.
Measure of Dispersion
Standard Score Both set A and set B have the same range
and mean. However, set B is more dispersed. The
difference between the value 70 and other values is more significant than set A.
To make a comparison more clearly, we can standardised the score, by transforming it into another distribution.
Measure of Dispersion
Standard Scorei. Z scoreThe Z score is the basic standardised score. It is referred to as thebasic form as other computations of standardised scores must firstcalculate the Z score. The formula is as follows:
Measure of Dispersion
Standard Scorei. Z scoreCalculate the Z Score for a set of scores below:
25, 34, 40, 45The mean for this set of scores is 36 and the SD is 8.6.
Table 1:
Measure of Dispersion
Raw Score Application of Formula
(Raw score- Mean)/ SD
Z Score
25 25-36/8.6 -1.28
34 34-36/8.6 -0.23
40 40-36/8.6 0.47
45 45-36/8.6 1.04
Standard Scorei. Z score
Measure of Dispersion
Exercise 5
Ahmad obtained 90 marks (total mark is 100) in English test. The mean for the achievement of the whole class is 70 and the standard deviation (SD) is 25. In a Mathematics test, Ahmad obtained 60 marks. The mean achievement for Mathematics for the whole class is 40 while the SD is 15. In which subject does Ahmad score better?
Measure of Dispersion
Exercise 6
A distribution of scores has a mean of 70. In this distribution, a score of x=80 is located 10 points above the mean.
a) Calculate z-scores for standard deviation 5 and 20.
b) Sketch the distribution and locate the position of x=80. Compare the two z-scores which corresponds to x=80.
Measure of Dispersion
Exercise 6
Measure of Dispersion
70 70 X=80X=80
Mean=5
Mean=20
z-score=2 z-score=0.05
Standard Scorei. Z score Z score values are very small and usually range only from –
2 to 2. Such small values make it inappropriate for score reporting
especially for those unaccustomed to the concept. Imagine what a parent may say if his child comes home
with a report card with a Z score of 0.80 in English Language!
Fortunately, there is another form of standardised score - the T score – with values that are more palatable to the relevant parties.
Measure of Dispersion
Standard Scoreii. T score The T score is a standardised score which can be
computed using the formula 10 (Z) + 50. As such, the T score for students A, B, C, and D
in the table 1 are as below:
Measure of Dispersion
Raw Score Application of Formula 10(Z) +50
T Score
25 10(-1.28) + 50 37.2
34 10 (-0.23) + 50 47.7
40 10(0.47) + 50 54.7
45 10 (1.04) + 50 60.4
Standard Scoreii. T score These values seem perfectly appropriate
compared to the Z score. The T score average or mean is always 50
(i.e. a standard deviation of 0) which connotes an average ability and the mid point of a 100 point scale.
Measure of Dispersion
Interpretation of dataThe standardised score is actually a very important score if we want to compare performance across tests and between students. Let us take the following scenario as an example:
Measure of Dispersion
Interpretation of dataHow can En. Abu solve this problem? He would have to havestandardised scores in order to decide. This would require thefollowing information:Test 1 : X = 42 standard deviation= 7Test 2 : X = 47 standard deviation= 8Using the information above, En. Abu can find the Z score for eachraw score reported as follows:
Table 2: Z Score for Form 2A
Measure of Dispersion
Interpretation of data Based on Table 2, both Ali and Chong have a
negative Z score as their total score for both tests. However, Chong has a higher Z score total (i.e. –
1.07 compared to – 1.34) and therefore performed better when we take the performance of all the other students into consideration.
Measure of Dispersion
Interpretation of dataTHE NORMAL CURVE The normal curve is a hypothetical curve that is
supposed to represent all naturally occurring phenomena.
Test scores that measure any characteristic such as intelligence, language proficiency or writing ability of a specific population is also expected to provide us with a normal curve.
The following is a diagram illustrating how the normal curve would look like.
Measure of Dispersion
Interpretation of dataTHE NORMAL CURVE
Figure 1: The normal distribution or Bell curve
Measure of Dispersion
Interpretation of dataTHE NORMAL CURVE The normal curve in Figure 1 is partitioned according to
standard deviations (i.e. – 4s, -3s, + 3s, + 4s) which are indicated on the horizontal axis.
The area of the curve between standard deviations is indicated in percentage on the diagram.
For example, the area between the mean (0 standard deviation) and +1 standard deviation is 34.13%.
Similarly, the area between the mean and –1 standard deviation is also 34.13%. As such, the area between –1 and 1 standard deviations is 68.26%.
Measure of Dispersion
Interpretation of dataTHE NORMAL CURVE In using the normal curve, it is important to
make a distinction between standard deviation values and standard deviation scores.
A standard deviation value is a constant and is shown on the horizontal axis of the diagram above.
Measure of Dispersion
Interpretation of dataTHE NORMAL CURVE The standard deviation score, on the other hand, is
the obtained score when we use the standard deviation formula provided earlier.
So, if we find the score to be 5 as in the earlier example, then the score for the standard deviation value of 1 is 5 and for the value of 2 is 5 x 2 = 10 and for the value of 3 is 15 and so on. Standard deviation values of –1, -2, and –3 will have corresponding negative scores of –5, -10, and –15.
Measure of Dispersion