Basic Measurement and Statistics in Testing. Outline Central Tendency and Dispersion Standardized Scores Error and Standard Error of Measurement (Sm)

Basic Measurement and Statistics in Testing

Outline

Central Tendency and Dispersion Standardized Scores Error and Standard Error of

Measurement (Sm) Item Analysis

Central Tendency and Dispersion

Central Tendency

Measures of central tendency are measures of the location of the middle or the center of a distribution. The definition of "middle" or "center" is purposely left somewhat vague so that the term "central tendency" can refer to a wide variety of measures. The mean is the most commonly used measure of central tendency.

Mean The arithmetic mean is what is commonly called the

average. The mean is the sum of all the scores divided by the number of scores.

The formula in summation notation is:

ΣX/N

The mean is a good measure of central tendency for roughly symmetric distributions but can be misleading in skewed distributions since it can be greatly influenced by scores in the tail. Therefore, other statistics such as the median may be more informative for distributions such as reaction time or family income that are frequently very skewed

Median The median is the middle of a distribution: half the

scores are above the median and half are below the median.

The median is less sensitive to extreme scores than the mean and this makes it a better measure than the mean for highly skewed distributions.

Computation of MedianWhen there is an odd number of numbers, the median is simply the middle number. For example, the median of 2, 4, and 7 is 4.

When there is an even number of numbers, the median is the mean of the two middle numbers. Thus, the median of the numbers 2, 4, 7, 12 is (4+7)/2 = 5.5.

Mode

The mode is the most frequently occurring score in a distribution and is used as a measure of central tendency. It is the only measure of central tendency that can be used with nominal data.

The mode is greatly subject to sample fluctuations and is therefore not recommended to be used as the only measure of central tendency. A further disadvantage of the mode is that many distributions have more than one mode. These distributions are called "multi modal."

In a normal distribution, the mean, median, and mode are identical.

Spread, Dispersion, Variability

A variable's spread is the degree to which scores on the variable differ from each other. If every score on the variable were about equal, the variable would have very little spread. There are many measures of spread. The distributions shown below have the same mean but differ in spread: The distribution on the bottom is more spread out.

Variability and dispersion are synonyms for spread.

Spread/Dispersion

Range The range is the simplest measure of spread or

dispersion: It is equal to the difference between the largest and the smallest values.

The range can be a useful measure of spread because it is so easily understood. However, it is very sensitive to extreme scores since it is based on only two values.

The range should almost never be used as the only measure of spread, but can be informative if used as a supplement to other measures of spread.

Example:The range of the numbers 1, 2, 4, 6, 12, 15, 19, 26 = 26 -1 = 25

Variance

The variance is a measure of how spread out a distribution is. In other words, they are measures of variability.

The variance is computed as the average squared deviation of each number from its mean.

For example, for the numbers 1, 2, and 3, the mean is 2 and the variance will be:

(1-2)2 + (2-2)2 + (3-2)2 = 0.667 3

Example of Calculation

Standard Deviation

The standard deviation formula is very simple: it is the square root of the variance. It is the most commonly used measure of spread.

In a normal distribution, about 68% of the scores are within one standard deviation of the mean and about 95% of the scores are within two standard deviations of the mean.

The standard deviation has proven to be an extremely useful measure of spread in part because it is mathematically tractable. Many formulas in inferential statistics use the standard deviation.

Different ways of calculating the standard deviation – the raw score method and the deviation method

Standard deviation score and standard deviation value

Standardized Scores

Z scores and T scores and their uses

Standardized Scores : Z scores Z-score Raw score – mean score / standard dev. Example:

ID X Mean D StdDv Z

1 95 90 5 5 1

2 90 90 0 5 0

3 85 90 -5 5 -1

Standardized Scores : Z scores Using the Z-score Comparing between scores in two tests Example, compare previous score with this:

ID X Mean D S Z

1 3 5.67 -2.67 2.45 -1.09

2 6 5.67 0.33 2.45 0.13

3 8 5.67 2.33 2.45 0.95

Standardized scores – T scores

Z scores are unfamiliar especially with ‘-’ scores Formula for T-score: T = 10 (Z) + 50

ID X Mean D Sd Z T

1 3 5.67 -2.67 2.45 -1.09 39.1

2 6 5.67 0.33 2.45 0.13 51.3

3 8 5.67 2.33 2.45 0.95 59.5

Error and Standard Error of Measurement (Sm)

Error and Standard Error of Measurement (Sm) Every score has an error Error either adds or subtracts from your

true score True score = Obtained score +/- Error How to calculate error? Sm = SD1 - r

Example

Obtained score = 20; SD = 2; r = 0.64 Sm = SD1 - r = 2 1- 0.64 = 2 0.36 = 2 x 0.6 = 1.2 True score = 20 – 1.2 = 18.8; and 20 + 1.2 =

21.2; or Between 18.8 and 21.2 (at 1 SEM)

Item Analysis

Item difficulty

Item discrimination

Distractor analysis

Item difficulty (p) How difficult is the item? Sometimes referred to as item facility. Used only with objective type tests Number of students who got the item correct

divided by the number of students who attempted the item.

Every item has an item difficulty value Possible values are from 0 to 1 with 0 indicating

a difficult item

Example

30 students attempted the item A 4 B 0 C 8 *D 18 Find p p = No. of students who got it right

No of students who attempted = 18/30 = .60 Note, this is also equal to 60 percent correct

Item Discrimination (D) To discriminate between good and weak

students Must determine the good and weak

students first Performance of good students compared

to performance of weak students divided by the number of students in either group

Every item has an item discrimination value which range from -1 to 1

Example Total number of students = 45 Number of students in Upper Group and Lower

Group = 15 each Options A B C *D Upper (Ug) 2 0 3 10 Lower (Lg) 2 1 6 6 Compute D D = No. in Ug correct – No. in Lg correct

No of students in either group D = 10 – 6 = 0.267

15

Deciding on Good and Bad Items

Item difficulty Item discrimination Check for miskeying, ambiguity and

guessing Evidence for miskeying: more chose

distractor than key Guessing: equal spread across options Ambiguity: equal number chose one

distractor and the key

END

Documents

Basic Measurement and Statistics in Testing. Outline Central Tendency and Dispersion Standardized Scores Error and Standard Error of Measurement (Sm)