Upload
kevin-mcgrew
View
2.505
Download
4
Tags:
Embed Size (px)
DESCRIPTION
The Art and Science of Applied Test Development. This is the fifth in a series of PPT modules explicating the development of psychological tests in the domain of cognitive ability using contemporary methods (e.g., theory-driven test specification; IRT-Rasch scaling; etc.). The presentations are intended to be conceptual and not statistical in nature. Feedback is appreciated.
Citation preview
The Art and Science of Test Development—Part G
Psychometric/technical statistical analysis: External
The basic structure and content of this presentation is grounded extensively on the test development procedures developed by Dr. Richard Woodcock
Kevin S. McGrew, PhD.
Educational Psychologist
Research DirectorWoodcock-Muñoz Foundation
“In god we trust….all others must show data” (unknown source)
Test authors and publishers have standards-based
responsibility to provide supporting psychometric technical information re:
tests and battery
Typically in the form of a series of technical chapters in manual or a
separate technical manual
Calculate psychometric/measurement statistics for technical manual/chapters
Use Joint Test Standards as a guide
With external measures
g
Gf Gv Glr Gs
Gc Gsm Ga
Theoretical Domain - CHC
Measurement or empirical domain
External evidence is focused on
relations between test battery
variables (measures or latent constructs)
and otherexternal (outside
of battery)constructs,
measures, or criteria
External Stage of Test Development
Purpose Examine the external relations among the focal construct (i.e., intelligence or cognitive abilities) and other constructs and/or subject characteristics
Questions asked Do the focal constructs and observed measures “fit” within a network of expected construct relations (i.e., the nomological network)
Method and concepts • Group differentiation• Structural equation modeling• Correlation of observed measures with other measures• Multitrait-Multimethod matrix
Characteristics of strong test validity program
• Focal constructs vary in theorized ways with other constructs• Measures of the constructs differentiate existing groups that
are known to differ on the constructs• Measures of focal constructs correlate with other validated
measures of the same constructs• Theory-based hypotheses are supported, particularly when
compared to rival hypotheses
External Stage of Test Development
Purpose Examine the external relations among the focal construct (i.e., intelligence or cognitive abilities) and other constructs and/or subject characteristics
Questions asked Do the focal constructs and observed measures “fit” within a network of expected construct relations (i.e., the nomological network)
Method and concepts • Correlation of observed measures with other measures
Characteristics of strong test validity program
• Measures of focal constructs correlate with other validated measures of the same constructs
Concurrent external validity example: WJ III GIA clusters correlations with other IQ
battery full scale scores
Provide evidence at select key age groups (related to intended age range and purpose of battery) in normal samples
Concurrent external validity example: WJ III Achievement (reading, math, writing) cluster correlations
with measures from other (external) ach. batteries
Provide evidence at select key age groups (related to intended age range and purpose of battery) in normal samples
Other
Battery
Total (Full Scale) Score
WJ III
Pred.
Ach.
WJ III
GIA-
Extended
WJ III
GIA-
Standard
DAS .41 -- .52 .47
WPPSI-R .37 -- .52 .47
WISC-III .50 .68 .67 .63
WAIS-III .39 .56 -- .56
KAIT .53 .56 -- .56
Concurrent external validity example: Comparative predictive validity (of achievement)
Comparisons of correlations (across reading, math, written language, and total achievement domains) of the average WJ III GIA and Predicted Achievement score
options and full scale scores from other (external) major intelligence batteries
Provide evidence at select key age groups (related to intended age range and purpose of battery) in normal samples
External Stage of Test Development
Purpose Examine the external relations among the focal construct (i.e., intelligence or cognitive abilities) and other constructs and/or subject characteristics
Questions asked Do the focal constructs and observed measures “fit” within a network of expected construct relations (i.e., the nomological network)
Method and concepts • Correlation of observed measures with other measures
Characteristics of strong test validity program
• Focal constructs vary in theorized ways with other constructs
• Measures of focal constructs correlate with other validated measures of the same constructs
• Focal constructs vary in theorized ways with other constructs• Measures correlate with other validated measures of the same constructs
(select illustrative examples—concurrent external validity correlations)
?
• Focal constructs vary in theorized ways with other
constructs
• Measures correlate with
other validated measures of the same constructs
(select illustrative example—
exploratory factor analysis of select
WJ III and WISC-III tests)
WJ IIIBLKROT
WISC-III Tests
Information 0.27
Coding 0.08
Similarities 0.29
Picture Arangment 0.14
Arithmetic 0.09
Block Design 0.38
Vocabulary 0.23
Object Assembly 0.31
Comprehension 0.15
Symbol Search 0.23
Digit Symbol 0.08
Note: Absolute magnitude of correlations artificially low due to sample range restriction. Important observation is relative magnitude of correlations
• Focal constructs vary in theorized ways with other constructs
• Measures correlate with other validated measures of the same constructs
(select illustrative example—WJ III Block Rotation [Gv-Vz] correlation with WISC-III tests in grade 3-5 sample)
Phelps et al. (2005) WISC-III/WJ III cross-
battery (joint) CFA
• Focal constructs vary in theorized ways with other
constructs
• Measures correlate with
other validated measures of the same constructs
(select illustrative example—
confirmatory factor analysis of select
WJ III and WISC-III tests)
Phelps et al. (2005) WISC-III/WJ III cross-battery (joint) CFA
VRBCMPZ
ANLSYNZ
CONFRMZ
CRSOUTZ
MEMSENZ
MEMWRDZ
NUMREVZ
PICRECZ
SPARELZ
VISCLOZ
VISMAT2Z
BLKROTZ
DECSPDZ
RETFLUZ
RPCNAMZ
AWKMEMZ
LWIDNTZ PSGCMPZRDGFLZ
WVOCSS
WSIMSS
WARITHSS
WINFOSS
WCOMPSS
WLNSSS
WPICCSS
WBDSS
WMATRSS
WPICASS
KDEFSS
KLOGSTSSKAUDCSS
KMYSCSS
KDOUBMSS
r1
r2
r3
r4
r6
r7
r8
r10
r11
r12
r13
r14
r15 r39
r38
r37
r36
r35
r34
r33
r32
r31
r30
r29
r28
r27
r26
r25
r24
r23
r22
r9
r42r43
r44
Gc
Gsm
GrwGf
Gv
Gsf2
f1
f9
f7
f3
f8
g
.70
.70
.89
.66
.71Gq
f10
.72
r5
.38
.45
.69
.90
.80
.19.24
.50
.73
.26
.57
.66
.76
.64
.69
.50
.67
.67
.67
.47
.55
.69
.30.53
.36
.60
.77
.21
.24
.59
.83
.85
.73
.36
.32
.64
.52.47
.80
.80.45
.35
.21
.54 .69
.51
Joint WJ III/WAIS-III/WMS-III/KAIT CFAGregg/Hoy College LD/NLD (n=200) Sample—Analysis by K. McGrew
(This is NOT the complete model..only portion that includes Gv factor information)
External Stage of Test Development
Purpose Examine the external relations among the focal construct (i.e., intelligence or cognitive abilities) and other constructs and/or subject characteristics
Questions asked Do the focal constructs and observed measures “fit” within a network of expected construct relations (i.e., the nomological network)
Method and concepts • Structural equation modeling
Characteristics of strong test validity program
• Theory-based hypotheses are supported, particularly when compared to rival hypotheses
Structural equation modeling external validity evidence example
Structural equation modeling external validity evidence example
Picture Recognition
Visual Matching
Decision Speed
Sound Blending
Gf
Gv
Gs
Glr
Ga
MemSpan
Oral Comp
WA
r12
r8
r14
r3
r15
r17
r18
Gc
General Information r13
r16
r20
r21
Incomplete Words
Sound Patterns
r9
r10
Vis-Aud Learningr7
Block Rotation
Spatial Relations
r1
r2
DR: Vis-Aud Lrng
Retrieval Fluencyr5
r6
Word Attackr24
Verbal Comp r11
Cross Outr19
.44
.35
.40
.82
.64
.73
.48
.69
.78
.64.49
.45
.96
g
f2
f6
f3
f8
f4
f5
f1
.85
.94
.87.8
4
.93
f7
.44
.78
.89
.83
Memory for Namesr4
.52
.79
.36
Analysis-Synthesis
Concept Formation
Numerical Reas
.63
.74.63
Mem for Sentences
Mem for Words
.78
.69
WorkMem
Numbers Reversed
Aud Working Mem
r23
r22.62
.67
.62
f9
.93
.07
.46
.27
.19
Ages 6-8
Structural equation modeling external validity evidence example
External Stage of Test Development
Purpose Examine the external relations among the focal construct (i.e., intelligence or cognitive abilities) and other constructs and/or subject characteristics
Questions asked Do the focal constructs and observed measures “fit” within a network of expected construct relations (i.e., the nomological network)
Method and concepts • Group differentiation
Characteristics of strong test validity program
• Measures of the constructs differentiate existing groups that are known to differ on the constructs
Group differentiation external validity evidence example: LD vs Non-LD university samples
Group differentiation external validity evidence example: Normal/Gifted/LD/MR samples
Group differentiation external validity evidence example—discriminant function analysis
(Normal/Gifted/LD/MR samples)
Group differentiation external validity evidence example—discriminant function analysis classification accuracy
(Normal/Gifted/LD/MR samples—grade 3-4)
Group differentiation external validity evidence example(variety of “clinical disorder groups”)
(continued on next slide)
Group differentiation external validity evidence example (cont.) variety of “clinical disorder groups”)
Lack of rigor and quality control in all prior/earlier stages will “rattle through the data” and rear its ugly head when performing the final statistical analysis, especially multivariate validity analyses (SEM, DF, multiple regression, EFA, CFA)
Shorts cuts in prior stages will “bite you in in the ____” as you attempt to perform final statistical analysis
Data screening, data screening, data screening!!!!……. prior to do performing final statistical analysis
• Compute extensive descriptive statistical analysis for all variables (e.g., histograms, scatterplots, box-whisker plots, etc.)
• More than means and SD’s. Also calculate median, skew, kurtosis, n-tiles, etc.
Deliberately planned and sophisticated “front end” data collection short-cuts (e.g., matrix sampling) introduce an extreme level of “back end” complexity to routine statistical/psychometric analysis
Know your limits, level of expertise, and skills. Even those with extensive test development experience often need access to trusted measurement/statistical consultants
(cont. next slide)
(Note: The following information is almost identical to that presented in Part F—Internal psychometric/statistical analysis)
Published statistics/psychometric information needs to be based on final publication length tests
• Often need to use test-length correction formula’s (e.g., KR-21) for test reliabilities
• Correlations between short /and or long norming versions of a test and other tests, that differ in test length (number of items) from publication length test, may need special adjustments/corrections.
Back up, back up, back up!!!!!!!!!! Don’t let a dead hard drive or computer destroy your work and progress. Do it constantly. Build redundancy into your files and people skill sets
Sad fact: Majority of test users do NOT pay attention to the fancy and special psychometric/statistical analysis you report in technical chapters or manuals. Be prepared for post-publication education via other methods.
Post-manual publication technical reports of special/sophisticated analyses are good when publication time-line pressures dictate making difficult decisions.
Most test developers are stuck in a methodological rut. There is much that can be learned about the internal and external validity of a test battery using lesser-used statistical methods.
• Multidimensional scaling (MDS); cluster analysis, CART (classification and regression tree analysis), MARS (multivariate applied regression splines)
Use of curve smoothing procedures to better estimate population parameters from statistical analyses across age groups.
Multiple group CFA (planned incomplete data) reference variable validity designs and methods (Jack McArdle).
End of Part G
Additional steps in test development process will be presented in subsequent modules as they are developed