Applied Psych Test Design: Part G: Psychometric/technical statistical analysis: External

The Art and Science of Test Development—Part G

Psychometric/technical statistical analysis: External

The basic structure and content of this presentation is grounded extensively on the test development procedures developed by Dr. Richard Woodcock

Kevin S. McGrew, PhD.

Educational Psychologist

Research DirectorWoodcock-Muñoz Foundation

“In god we trust….all others must show data” (unknown source)

Test authors and publishers have standards-based

responsibility to provide supporting psychometric technical information re:

tests and battery

Typically in the form of a series of technical chapters in manual or a

separate technical manual

Calculate psychometric/measurement statistics for technical manual/chapters

Use Joint Test Standards as a guide

With external measures

g

Gf Gv Glr Gs

Gc Gsm Ga

Theoretical Domain - CHC

Measurement or empirical domain

External evidence is focused on

relations between test battery

variables (measures or latent constructs)

and otherexternal (outside

of battery)constructs,

measures, or criteria

External Stage of Test Development

Purpose Examine the external relations among the focal construct (i.e., intelligence or cognitive abilities) and other constructs and/or subject characteristics

Questions asked Do the focal constructs and observed measures “fit” within a network of expected construct relations (i.e., the nomological network)

Method and concepts • Group differentiation• Structural equation modeling• Correlation of observed measures with other measures• Multitrait-Multimethod matrix

Characteristics of strong test validity program

• Focal constructs vary in theorized ways with other constructs• Measures of the constructs differentiate existing groups that

are known to differ on the constructs• Measures of focal constructs correlate with other validated

measures of the same constructs• Theory-based hypotheses are supported, particularly when

compared to rival hypotheses




Method and concepts • Correlation of observed measures with other measures


• Measures of focal constructs correlate with other validated measures of the same constructs

Concurrent external validity example: WJ III GIA clusters correlations with other IQ

battery full scale scores

Provide evidence at select key age groups (related to intended age range and purpose of battery) in normal samples

Concurrent external validity example: WJ III Achievement (reading, math, writing) cluster correlations

with measures from other (external) ach. batteries


Other

Battery

Total (Full Scale) Score

WJ III

Pred.

Ach.

WJ III

GIA-

Extended

WJ III

GIA-

Standard

DAS .41 -- .52 .47

WPPSI-R .37 -- .52 .47

WISC-III .50 .68 .67 .63

WAIS-III .39 .56 -- .56

KAIT .53 .56 -- .56

Concurrent external validity example: Comparative predictive validity (of achievement)

Comparisons of correlations (across reading, math, written language, and total achievement domains) of the average WJ III GIA and Predicted Achievement score

options and full scale scores from other (external) major intelligence batteries





Method and concepts • Correlation of observed measures with other measures


• Focal constructs vary in theorized ways with other constructs

• Measures of focal constructs correlate with other validated measures of the same constructs

• Focal constructs vary in theorized ways with other constructs• Measures correlate with other validated measures of the same constructs

(select illustrative examples—concurrent external validity correlations)

?

• Focal constructs vary in theorized ways with other

constructs

• Measures correlate with

other validated measures of the same constructs

(select illustrative example—

exploratory factor analysis of select

WJ III and WISC-III tests)

WJ IIIBLKROT

WISC-III Tests

Information 0.27

Coding 0.08

Similarities 0.29

Picture Arangment 0.14

Arithmetic 0.09

Block Design 0.38

Vocabulary 0.23

Object Assembly 0.31

Comprehension 0.15

Symbol Search 0.23

Digit Symbol 0.08

Note: Absolute magnitude of correlations artificially low due to sample range restriction. Important observation is relative magnitude of correlations

• Focal constructs vary in theorized ways with other constructs

• Measures correlate with other validated measures of the same constructs

(select illustrative example—WJ III Block Rotation [Gv-Vz] correlation with WISC-III tests in grade 3-5 sample)

Phelps et al. (2005) WISC-III/WJ III cross-

battery (joint) CFA

• Focal constructs vary in theorized ways with other

constructs

• Measures correlate with

other validated measures of the same constructs

(select illustrative example—

confirmatory factor analysis of select

WJ III and WISC-III tests)

Phelps et al. (2005) WISC-III/WJ III cross-battery (joint) CFA

VRBCMPZ

ANLSYNZ

CONFRMZ

CRSOUTZ

MEMSENZ

MEMWRDZ

NUMREVZ

PICRECZ

SPARELZ

VISCLOZ

VISMAT2Z

BLKROTZ

DECSPDZ

RETFLUZ

RPCNAMZ

AWKMEMZ

LWIDNTZ PSGCMPZRDGFLZ

WVOCSS

WSIMSS

WARITHSS

WINFOSS

WCOMPSS

WLNSSS

WPICCSS

WBDSS

WMATRSS

WPICASS

KDEFSS

KLOGSTSSKAUDCSS

KMYSCSS

KDOUBMSS

r1

r2

r3

r4

r6

r7

r8

r10

r11

r12

r13

r14

r15 r39

r38

r37

r36

r35

r34

r33

r32

r31

r30

r29

r28

r27

r26

r25

r24

r23

r22

r9

r42r43

r44

Gc

Gsm

GrwGf

Gv

Gsf2

f1

f9

f7

f3

f8

g

.70

.70

.89

.66

.71Gq

f10

.72

r5

.38

.45

.69

.90

.80

.19.24

.50

.73

.26

.57

.66

.76

.64

.69

.50

.67

.67

.67

.47

.55

.69

.30.53

.36

.60

.77

.21

.24

.59

.83

.85

.73

.36

.32

.64

.52.47

.80

.80.45

.35

.21

.54 .69

.51

Joint WJ III/WAIS-III/WMS-III/KAIT CFAGregg/Hoy College LD/NLD (n=200) Sample—Analysis by K. McGrew

(This is NOT the complete model..only portion that includes Gv factor information)




Method and concepts • Structural equation modeling


• Theory-based hypotheses are supported, particularly when compared to rival hypotheses

Structural equation modeling external validity evidence example


Picture Recognition

Visual Matching

Decision Speed

Sound Blending

Gf

Gv

Gs

Glr

Ga

MemSpan

Oral Comp

WA

r12

r8

r14

r3

r15

r17

r18

Gc

General Information r13

r16

r20

r21

Incomplete Words

Sound Patterns

r9

r10

Vis-Aud Learningr7

Block Rotation

Spatial Relations

r1

r2

DR: Vis-Aud Lrng

Retrieval Fluencyr5

r6

Word Attackr24

Verbal Comp r11

Cross Outr19

.44

.35

.40

.82

.64

.73

.48

.69

.78

.64.49

.45

.96

g

f2

f6

f3

f8

f4

f5

f1

.85

.94

.87.8

4

.93

f7

.44

.78

.89

.83

Memory for Namesr4

.52

.79

.36

Analysis-Synthesis

Concept Formation

Numerical Reas

.63

.74.63

Mem for Sentences

Mem for Words

.78

.69

WorkMem

Numbers Reversed

Aud Working Mem

r23

r22.62

.67

.62

f9

.93

.07

.46

.27

.19

Ages 6-8





Method and concepts • Group differentiation


• Measures of the constructs differentiate existing groups that are known to differ on the constructs

Group differentiation external validity evidence example: LD vs Non-LD university samples

Group differentiation external validity evidence example: Normal/Gifted/LD/MR samples

Group differentiation external validity evidence example—discriminant function analysis

(Normal/Gifted/LD/MR samples)

Group differentiation external validity evidence example—discriminant function analysis classification accuracy

(Normal/Gifted/LD/MR samples—grade 3-4)

Group differentiation external validity evidence example(variety of “clinical disorder groups”)

(continued on next slide)

Group differentiation external validity evidence example (cont.) variety of “clinical disorder groups”)

Lack of rigor and quality control in all prior/earlier stages will “rattle through the data” and rear its ugly head when performing the final statistical analysis, especially multivariate validity analyses (SEM, DF, multiple regression, EFA, CFA)

Shorts cuts in prior stages will “bite you in in the ____” as you attempt to perform final statistical analysis

Data screening, data screening, data screening!!!!……. prior to do performing final statistical analysis

• Compute extensive descriptive statistical analysis for all variables (e.g., histograms, scatterplots, box-whisker plots, etc.)

• More than means and SD’s. Also calculate median, skew, kurtosis, n-tiles, etc.

Deliberately planned and sophisticated “front end” data collection short-cuts (e.g., matrix sampling) introduce an extreme level of “back end” complexity to routine statistical/psychometric analysis

Know your limits, level of expertise, and skills. Even those with extensive test development experience often need access to trusted measurement/statistical consultants

(cont. next slide)

(Note: The following information is almost identical to that presented in Part F—Internal psychometric/statistical analysis)

Published statistics/psychometric information needs to be based on final publication length tests

• Often need to use test-length correction formula’s (e.g., KR-21) for test reliabilities

• Correlations between short /and or long norming versions of a test and other tests, that differ in test length (number of items) from publication length test, may need special adjustments/corrections.

Back up, back up, back up!!!!!!!!!! Don’t let a dead hard drive or computer destroy your work and progress. Do it constantly. Build redundancy into your files and people skill sets

Sad fact: Majority of test users do NOT pay attention to the fancy and special psychometric/statistical analysis you report in technical chapters or manuals. Be prepared for post-publication education via other methods.

Post-manual publication technical reports of special/sophisticated analyses are good when publication time-line pressures dictate making difficult decisions.

Most test developers are stuck in a methodological rut. There is much that can be learned about the internal and external validity of a test battery using lesser-used statistical methods.

• Multidimensional scaling (MDS); cluster analysis, CART (classification and regression tree analysis), MARS (multivariate applied regression splines)

Use of curve smoothing procedures to better estimate population parameters from statistical analyses across age groups.

Multiple group CFA (planned incomplete data) reference variable validity designs and methods (Jack McArdle).

End of Part G

Additional steps in test development process will be presented in subsequent modules as they are developed

Technology

Applied Psych Test Design: Part G: Psychometric/technical statistical analysis: External