1
Chapter 5
Statistical Concepts:Creating New Scores to
Interpret Test Data
2
Norm Referencing vs. Criterion Referencing
Norm-referenced vs. Criterion Referenced Testing Norm-referenced – scores are
compared to a set of test scores called the peer or norm group.
Criterion-referenced – scores are compared to predetermined standard (criterion)
See Taber 5.1 p. 85
3
Normative Comparisons and Derived Scores
Types of Derived scores:
1. percentiles
2. standard scores
3. developmental norms
4
Percentiles
The percentage of people falling below the obtained score and ranges from 1 to 99, with 50 being the mean
Do not to confuse percentile scores with the term “percentage correct”
(see Figure 5.1)
5
Standard Scores
z-scoresT-scoresDeviation IQs, staninessten scores National Curve Equivalents (NCE)college and graduate school entrance exam scores (e.g., SATs, GREs, and ACTs)publisher type scores
6
Standard Scores (Cont’d)
z-scores A standard score that helps us
understand where an individual falls on the normal curve
Practically speaking, z-scores run from -4.0 to plus 4.0
To find a z-score: (X – M)/SD See Figure 5.2, p. 88
7
Standard Scores (Cont’d)
“z-scores are golden” (Rule #3)
z-scores help us see where an individual’s raw score falls on a normal curve and are helpful for converting a raw score to other kinds of derived scores. That is why we like to keep in mind that z-scores are golden and can often be used to help us understand the meaning of scores.
8
Standard Scores (Cont’d)
Converting a z-score to a percentile
Find your z-score, then either approximate percentile or look at Appendix E for conversion of z-scores to percentiles.
See Fig. 5.2, p. 88
9
Standard Scores (Cont’d)
Converting to Standard Scores
1. Get your z-score
2. Plug your z-score into the following conversion formula
z-score (SD of desired score) + mean of desired score
10
Standard Scores (Cont’d)
Converting z-scores to standard scores Use your conversion formula by plugging in
the means and standard deviations for each of the respective standard scores that follow:T-scores (M = 50, SD = 10) used for
personality tests mostly (Fig. 5.3, p. 89)DIQ scores (M = 100, SD = 15). Mostly
used for tests of intelligence. (Fig. 5.4, p. 90)
Stanines (M = 5, SD = 2), round off to nearest whole number). Mostly used for achievement testing. (Fig. 5.5, p. 91)
11
Standard Scores (Cont’d)
Converting z-scores to standard scores. (Cont’d)Use conversion formula with following: Sten scores (M = 5.5, SD = 2), round off to
nearest whole number). Used with personality inventories and questionnaires (Fig. 5.6, p. 92)
NCEs (M of 50, SD of 21.06). Like percentiles in that they basically range from 1 – 99 but evenly distributed. (Percentiles, bunch up around mean). Usually used for educational tests (Fig. 5.7, p. 93)
12
Standard Scores (Cont’d)
Converting z-scores to standard scores (Cont’d)
Use conversion formula with following: SATs/GREs (M = 500, SD = 100) ACTs (M = 21, SD = 5) Publisher Type Scores: Mean and
Standard deviations arbitrarily set by publisher
13
Developmental Norms
Age Comparisons When you are being compared to others at your
same age Often done for physical attributes:
My 10 year-old weighs 78 lbs, what percentile is she compared to others her age?
Can use z-scores to determine: E.g., [78 – 80 (mean)]/8 (SD) = -.25 z or a
percentile of about 40 (p = 40) See Box 5.4, p. 95
14
Developmental Norms (Cont’d)
Grade Equivalents (GE): Compares child’s score to his or her grade level E.g., Child in 5.6 grade and gets at mean
compared to peer group. GE = 5.6 GE over 5.6 means that he or she is doing better
than his or her peer group GE below 5.6 means that he or she is doing worse
than his or her peer group Usually, not a statement about how much better
or how much worse (a child who is in 5.6 and gets GE of 7.5 is not necessarily at 7.5 grade level)
15
Standard Error of Measurement
Based on reliability of testTells you how much error there is in the test and ultimately how much any individual’s score might fluctuate due to this errorFormula:Multiply the standard deviation of the score you are using (e.g., for T-score, SD = 10, for DIQ SD = 15) times the square root of 1 minus the reliability of the testSee Box 5.5, p. 99
16
Standard Error of Measurement
For example: If you receive a T-score of 60 on a personality
test and the reliability of the test is .84, what is your SEM?
10(.4) = 4 This means that 68 percent of time your score
will fall between 56 and 64, and 95 percent of time your score will fall between
52 and 68 See Figure 5.8, p. 97
17
Standard Error of Estimate
Gives a confidence interval around a predicted scoreBased on scores received on one variable, you can predict the range of scores you will receive on a second variableE.g., If you know your high school GPA and the correlation between high school GPA and 1st year college grades, you can predict the range of GPA you are likely to get in your 1st year of college.Formula:
See Box 5.6, p. 100
21 rSDSE Yest
18
Rule Number 4: Don’t Mix Apples and Oranges
As you practice various formulas in class, it is easy to use the wrong score, mean, or standard deviation. For instance, in determining the SEM for Latisha (in book), we used Latisha’s DIQ score of 120 and figured out the SEM using the DIQ standard deviation of 15. However, if we had been asked to figure out the SEM of her raw score, we would use her raw score and the standard deviation of raw scores. Whenever you are asked to figure out a problem, remember to use the correct set of numbers (don’t mix apples and oranges), otherwise your answer will be incorrect.
19
Scales of Measurement
In assessment, we measure characteristics in quantifiable terms: but “quantifiable” can be defined differently E.g., gender is measured as either male or female E.g. achievement can be measured on a scale that
has a large range of scoresFour different scales of measurement have been created to help us define “quantifiable”Type of scale you use is intimately related to type of instrument you choose
20
Scales of Measurement (Cont’d)
Four scales:1.Nominal: Numbers arbitrarily assigned to categories:
E.g., Race: 1=Asian, 3=Latino,2=African American 4=Caucasian
2.Ordinal: magnitude or rank order is implied Rate the following: “The counseling was helpful”
1. = Strongly Disagree2. = Somewhat Disagree3. = Neutral4. = Somewhat Agree5. = Strongly Agree.
21
Scales of Measurement (Cont’d)
Four scales (Cont’d)3. Interval: Establishes equal distances between
measurements. No absolute zero. E.g., A 600 on the GRE is better than a 550
but not twice as good as a 300
4. Ratio: Has meaningful zero point and equal intervals. If I weigh 200, I weigh twice as much as
someone who weighs 100