Upload
kevin-mcgrew
View
3.772
Download
2
Tags:
Embed Size (px)
DESCRIPTION
The Art and Science of Applied Test Development. This is the fifth in a series of PPT modules explicating the development of psychological tests in the domain of cognitive ability using contemporary methods (e.g., theory-driven test specification; IRT-Rasch scaling; etc.). The presentations are intended to be conceptual and not statistical in nature. Feedback is appreciated.
Citation preview
The Art and Science of Test Development—Part E
Calculate norms and derived scores
The basic structure and content of this presentation is grounded extensively on the test development procedures developed by Dr. Richard Woodcock
Kevin S. McGrew, PhD.
Educational Psychologist
Research DirectorWoodcock-Muñoz Foundation
Part A: Planning, development frameworks & domain/test specification blueprints
Part B: Test and Item Development
Part C: Use of Rasch Technology
Part D: Develop norm (standardization) plan
Part E: Calculate norms and derived scores
Part F: Psychometric/technical and statistical analysis: Internal
Part G: Psychometric/technical and statistical analysis: External
The Art and Science of Test Development
The above titled topic is presented in a series of sequential PowerPoint modules. It is strongly recommended that the modules (A-G) be viewed in sequence.
The current module is designated by red bold font lettering
Norm: A standard or range of values that represents the typical performance of a group or of an individual (of a certain age, for example) against which comparisons can be made
How do we construct age-based norms from standardization norm data?
Answer: Curve fitting of sorted subsample data points is the engine that drives the development of all derived scores
546
432
Block Rotation Summary: Final
Rasch for Publication test – graphic item map
n = 37 norming items (0-74 RS
points)n = 4,722 norm
subjects
Pub. TestW-score
scale
Graphic display of distribution of Block
Rotation person abilities
These Block Rotation W-scores are then used for developing test
“norms” and validity research
11AgeW
1AgeW
4AgeW
3AgeW
2AgeW
5AgeW
6AgeW
9AgeW
8AgeW
7AgeW
10AgeW
…AgeW
13AgeW
12AgeW
8,000AgeW
n =8,000+ norm subjects
………………………………………………………………………………………
1. Sort 8,000 subjects from youngest (CA in months) to oldest
Youngest Oldest
2. Divide sorted pool of subjects in successively older blocks of n=50
3. Calculate “weighted” (US Census derived subject weights) median (average) CAMOS (X) and REF W (Y) for each block
Mdn CA x1 x2 ……..
Mdn W y1 y2 ……..
4. Plot mdn CAMOS (x1, x2,..) and REF W (y1, y2…) and smooth curve
WJ III “classic” norm calculation procedures
(each ball represents an individual norm subject)
20 40 60 80 100 120250
300
350
400
450
500
550
250
300
350
400
450
500
550Example:
Letter-Word ID Ref W (20-120 months) raw data points
Each data point is a “sample” that contains “sampling error” --- this accounts for the “bounce” between data points. How do we deal with this sampling error (bounce) to construct norms and derived scores?
20 40 60 80 100 120250
300
350
400
450
500
550
250
300
350
400
450
500
550
Letter-Word ID Ref W (20-120 months) polynomial curve generated solution(using special curve fitting software)
The smoothed curve represents the best approximation of the population average norm W-score for a test (Reference W or REF W)
Obtaining Developmental Scores (age/grade equivalents)
A W-score of 450 (for Letter-Word Identification test) = 2.4 grade equivalent
A W-score of 400 = 1.3 grade equivalent
Smoothed age curves are used in the same manner to obtain age equivalents
Developing norms and derived scores: What does a tested person’s score on a test mean when compared to the appropriate reference group (age
norms will be used as example)
The meaning of a Block Rotation W-score of X (e.g., 477) will have different interpretations when compared to different age group norm subsamples
Measures of relative standing (percentile rank, standard score) derive meaning based on how far away the person’s W-score is from average (for age)BBB
Block Rotation W-scale
545.7477431.6
2 yr olds 3 yr olds 4 yr olds 5 yr olds
Obtaining Measures of Relative Standing: A subjects W-score for a specific measure is compared to the average W-scores for that subjects
specific age (age norms) or grade (grade norms). This is called the Reference W (REF W)
Expected “average REF W” for someone tested at grade 3.0 (grade norms) is 472.5 (obtained score of 472.5
would be SS=100; PR=50)
SS (M=100; SD=15) = (z x 15) + 100
• e.g. z = -1; SS = 85
“Mean” is the smoothed “Ref W” value for a specific
age/grade
“SD” is the smoothed SD (10/90) for a
specific age/grade
Obtaining Scores of Relative Standing: Subjects obtained W-score for a specific measure is compared to the distribution (mean and SD) of W-scores for that
subjects specific age (age norms) or grade (grade norms)
Custom software generatednorm “setup” data file
example(Block Rotation)
Input for graphing and polynomial curve fitting
X Y
X Y
Note: These examples are from original WJ III 2001 norms and not the subsequent WJ III NU
(2007) norms
r^2=0.12670607 Eqn 8160 Line(a,b) Robust None7667WLO Eqn 7667 Chebyshev=>Std Rational Order 8/9
Block Rotation Ref-W Age
12 120 1200Age (in months)
460
465
470
475
480
485
490
495
500
505
510
Ref
-W
460
465
470
475
480
485
490
495
500
505
510
Ref
-W
Solution A: Up to 230 months (note: age scale is a log scale)
Original Block Rotation Reference W age-based curve fitting: A real-world example of the “art + science” of constructing norms
Note: These examples are from original WJ III 2001 norms and not the subsequent WJ III NU (2007) norms
r^2=0.12670607 Eqn 8160 Line(a,b) Robust None6870WHI Eqn 6870 Chebyshev=>Std Polynomial Order 20
Block Rotation Ref-W Age
12 276 540 804 1068Age (in months)
460
465
470
475
480
485
490
495
500
505
510
Ref
-W
460
465
470
475
480
485
490
495
500
505
510
Ref
-W
Solution B: 231 to 1200 months (note: age scale is regular interval
scale--not log scale)
Original Block Rotation Reference W age-based curve fitting: A real-world example of the “art + science” of constructing norms
Note: These examples are from original WJ III 2001 norms and not the subsequent WJ III NU (2007) norms
Original Block Rotation Reference W age-based curve fitting: A real-world example of the “art + science” of constructing norms
Note: These examples are from original WJ III 2001 norms and not the subsequent WJ III NU (2007) norms
Curve solution A “feathered/blended” with Curve Solution B at 230 months for single final solution. Sometimes more than 2 curve parts are
needed for age norms.
Final smoothed curves serve as the mechanism for the published norms, either in the form of equations in software
Or,
Note: These examples are from original WJ III 2001 norms and not the subsequent WJ III NU
(2007) norms
Tables of values for published norms in test manuals
etc……
Age(in months)
ReferenceW
Age(in months)
ReferenceW
Note: These examples are from original WJ III 2001 norms and not the subsequent WJ III NU
(2007) norms
Custom software generatednorm “setup” data file
example(Block Rotation)
Input for graphing and polynomial curve fitting
X Y
X Y
Note: These examples are from original WJ III 2001 norms and not the subsequent WJ III NU
(2007) norms
Block Rotation SD90 AgeRank 2502 Eqn 7938 y=(a+cx^(0.5)+ex+gx^(1.5)+ix^2)/(1+bx^(0.5)+dx+fx^(1.5)+hx^2+jx^(2.5)) [NL]
r^2=0.48235094 DF Adj r^2=0.41998358 FitStdErr=1.6978814 Fstat=8.6968999a=15.791894 b=0.66619087 c=1.7270779 d=-0.2462822 e=-1.0287721
f=0.02543265 g=0.082451267 h=-0.00095281528 i=-0.0010522608 j=1.5044367e-05
12 120 1200Age (in months)
1
3
5
7
9
11
13
15
SD
90
1
3
5
7
9
11
13
15
SD
90
Original Block Rotation SD90 age-based curve fitting: A real-world example of the “art + science” of constructing norms
Note: These examples are from original WJ III 2001 norms and not the subsequent WJ III NU (2007) norms
Same is done for SD 10
Block Rotation SD90 AgeRank 2502 Eqn 7938 y=(a+cx^(0.5)+ex+gx^(1.5)+ix^2)/(1+bx^(0.5)+dx+fx^(1.5)+hx^2+jx^(2.5)) [NL]
r^2=0.48235094 DF Adj r^2=0.41998358 FitStdErr=1.6978814 Fstat=8.6968999a=15.791894 b=0.66619087 c=1.7270779 d=-0.2462822 e=-1.0287721
f=0.02543265 g=0.082451267 h=-0.00095281528 i=-0.0010522608 j=1.5044367e-05
12 276 540 804 1068Age (in months)
1
3
5
7
9
11
13
15
SD
90
1
3
5
7
9
11
13
15
SD
90
Original Block Rotation SD90 age-based curve fitting: A real-world example of the “art + science” of constructing norms
Same is done for SD 10
Note: These examples are from original WJ III 2001 norms and not the subsequent WJ III NU (2007) norms
Final smoothed curves serve as the mechanism for the published norms, either in the form of equations in software
Or,
Note: These examples are from original WJ III 2001 norms and not the subsequent WJ III NU
(2007) norms
Tables of values for published norms in test manuals
etc……
Age(in months)
SD (in W units)
Age(in months)
SD (in W units)
Note: These examples are from original WJ III 2001 norms and not the subsequent WJ III NU
(2007) norms
Obtaining Scores of Relative Standing: Subjects W-score for a specific measure is then compared to the distribution of W-scores for that subjects
specific age (age norms) or grade (grade norms)
Smoothed REF W (average)
Smoothed SD90
Smoothed SD10
Note: These are NOT the curves for Block
Rotation. They are from another measure. Used
here as example
More is better. The larger the sample the smaller will be the sampling error associated with computed scores
When calculating norm curves, use medians for each age (or grade) block—not means
Special test(s)-cluster consistency checks and procedures need to be used to prevent test(s)-cluster score anomalies
Apply the proposed norms for each measure to the actual norm data as a quality control procedure
If concurrent validity data are available (correlations with other published and respected measures of similar abilities/constructs), it may be wise to apply proposed norms to your tests and then compare the respective set on the derived scores to the external measures via correlations and descriptive statistics (means and SDs).
• May be particularly informative if you begin to question the variability in your norm sample data (to restricted or to variable)
• You are using other as crude “benchmarks” established test batteries
Use of bootstrap re-sampling methods in curve fitting
Special proprietary iterative curve fitting Q/A procedures for selecting best possible curve from a pool of plausible curves
Different subject weighting procedures
Calculating other measure norms
•Cluster norms (combinations of tests)
•Differentially weighted cluster norms (e.g., WJ III GIA cluster)
•Discrepancy norms
Special test-cluster consistency checks and procedures
Creating special Rasch (W-score) based interpretative scoring options and features (e.g., RPI, instructional ranges) – explained in separate PPT module
Special test-length correction procedures for calculation of reliabilities and correlations
Linear vs area (normalization) transformation of scores. Woodcock combined approach
With publication of WJ III NU norms, we now use bootstrap generated “sticks” and not raw single data points
WJ III NU boostraping: If you really want to know check out ASB9
WJ III NU boostraping: If you really want to know check out ASB9
WJ III NU boostraping: If you really want to know check out ASB9
End of Part E
Additional steps in test development process will be presented in subsequent modules as they are developed