Upload
others
View
9
Download
1
Embed Size (px)
Citation preview
1
PSYC 426 Dr. Kline
References
Slide details
2
References Binet, A, & Simon, T. (1916). New methods for the diagnosis of the intellectual level of subnormals. In E. S. Kite (Trans.),
The development of intelligence in children. Vineland, NJ: Publications of the Training School at Vineland.
Spearman, C. (1904a). General intelligence, objectively determined and measured. American Journal of
Psychology, 15, 201–293.
Spearman, C. (1904b). The proof and measurement of the association between two things. American Journal of
Psychology, 15, 72–101.
American Educational Research Association, American Psychological Association, & National Council on
Measurement in Education. (2014). Standards for educational and psychological testing. Author: Washington,
DC.
Appelbaum, M., Cooper, H., Kline, R. B., Mayo-Wilson, E., Nezu, A. M., & Rao, S. M. (2018). Journal article reporting
standards for quantitative research in psychology: The APA Publications and Communications Board Task
Force report. American Psychologist, 73, 3–25.
Aiken, L. S., West, S., & Millsap, R. E. (2008). Doctoral training in statistics, measurement, and methodology in
psychology: Replication and extension of Aiken, West, Sechrest, and Reno’s (1990) survey of PhD programs in
North America. American Psychologist, 63, 32–50.
Vacha–Haase, T., & Thompson, B. (2011). Score reliability: A retrospective look back at 12 years of reliability
generalization. Measurement and Evaluation in Counseling and Development, 44, 159–168.
Thompson, B. (Ed.). (2003). Score reliability: Contemporary thinking on reliability issues. Thousand Oaks, CA: Sage.
Mulsant, B. H., Kastango, K. B., Rosen, J., Stone, R. A., Mazumdar, S., & Pollock, B. G. (2002). Interrater reliability in
clinical trials of depressive disorders. American Journal of Psychiatry, 159, 1598–1600.
Streiner, D. L. (2003). Starting at the beginning: An Introduction to coefficient alpha and internal consistency. Journal
of Personality Assessment, 80, 99–103.
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–
73.
Reeves, T. D., & Marbach-Ad, G. (2016). Contemporary test validity in theory and practice: A primer for discipline-
based education researchers. CBE Life Sciences Education, 15(1), 1–9.
3
Kline, R. B. (2013). Exploratory and confirmatory factor analysis. In Y. Petscher & C. Schatsschneider (Eds.), Applied
quantitative analysis in the social sciences (pp. 171–207). New York: Routledge.
Costello, A. B., & Osborne, J. W. (2005). Best practices in exploratory factor analysis: Four recommendations for
getting the most from your analysis. Practical Assessment, Research & Evaluation, 10(7). Retrieved from
http://pareonline.net/
Huck, S. W. (2016). Statistical misconceptions (Classic ed.). New York: Routledge.
Downing, S. M., & Haladyna, T. M. (1997). Test item development: Validity evidence from quality assurance
procedures. Applied Measurement in Education, 10, 61–82.
First, M. B., Williams, J. B. W., Karg, R.S., & Spitzer, R. L. (2016). Structured Clinical Interview for DSM-5 Disorders,
Clinician Version (SCID-5-CV). Arlington, VA, American Psychiatric Association.
Schwarz, N., Knäuper, B., Hippler, H.-J., Noelle-Neumann, E., & Clark, L. (1991). Rating scales: Numeric values may
change the meaning of scale labels. Public Opinion Quarterly, 55, 570–582.
Baron, H. (1996). Strengths and limitations of ipsative measurement. Journal of Occupational and Organizational
Psychology, 69, 49–56.
Kelley, T.L. (1939). The selection of upper and lower groups for the validation of test items. Journal of Educational
Psychology, 30, 17–24.
Ellingsen, K. M. (2016). Standardized assessment of cognitive development: Instruments and issues. In A. Garro (Ed.),
Early childhood assessment in school and clinical child psychology (pp. 25–49). New York: Springer.
Gardner, H. (1993). Multiple intelligences: The theory in practice. New York: Basic.
Cormier, D. C., Kennedy, K. E., & Aquilina, A. M. (2016). Test review of the Wechsler Intelligence Scale for Children,
Fifth Edition: Canadian (WISC-VCDN). Canadian Journal of School Psychology, 31, 322–334.
van Widenfelt, B. M., Treffers, P. D. A., de Beurs, E., Siebelink, B. M., & Koudijs, E. (2005). Translation and cross-cultural
adaptation of assessment instruments used in psychological research with children and families. Clinical Child
and Family Psychology Review, 8, 135–147.
Van de Vijver, F., & Hambleton, R. K. (1996). Translating tests: Some practical guidelines. European Psychologist, 1,
89–99.
Ercikan, K., Gierl, M. J., McCreith, T., Puhan, G., & Koh, K. (2004). Comparability of bilingual versions of assessments:
Sources of incomparability of English and French versions of Canada's national achievement tests. Applied
Measurement in Education, 17, 301–321.
Reynolds, C. R., & Suzuki, L. A. (2013). Bias in psychological assessment: An empirical review and recommendations.
In J. R. Graham, J. A. Naglieri, & I. B. Weiner (Eds.), Handbook of psychology: Assessment psychology (pp. 82–
113). Hoboken, NJ: Wiley.
4
Tierney, R. D. (2016). Fairness in educational assessment. In M.A. Peters (Ed.), Encyclopedia of educational
philosophy and theory (pp. 793–798). Singapore: Springer.
Messick, S. (1995). Validation of inferences from persons’ responses and performances as scientific inquiry into score
meaning. American Psychologist, 50, 741–749.
Karami, H., & Mok, M. M. C. (Eds.). (2013). (Eds). Fairness issues In educational assessment [Special issue]. Educational
Research and Evaluation, 19(2–3).
Laundra, K., & Sutton, T. (2008). You think you know ghetto? Contemporizing the Dove “Black IQ Test.” Teaching
Sociology, 36, 366–377.
Kaufman, A. S. (1990). Assessing adolescent and adult intelligence. Boston: Allyn & Bacon.
Kline, R. B. (1998). [Review of the software Kaufman WISC-III Integrated Interpretive System (K-WIIS, Version 1.00), by
A. S. Kaufman, N. L. Kaufman, E. H. Doughterty, & K. S. C. Tuttle]. Journal of Psychoeducational Assessment, 16,
365–384.
Hunsley, J., Lee, C, M., Wood, J. M., & Taylor, W. (2015). Controversial and questionable assessment techniques. In S.
O. Lilienfeld, S. J. Lynn, & J. M. Lohr (Eds.), Science and pseudoscience in clinical psychology (2nd ed.) (pp.
42–82). New York: Guilford.
Wood, J. M., Nezworski, M. T., Lilienfeld, S. O., & Garb, H. N. (2003). What's wrong with the Rorschach: Science
confronts the controversial inkblot test. San Francisco: Jossey-Bass.
5
Slide details
Introduction
CTT
Scores
Reliability
Validity
Items
Ability
Translation
Bias
Ethics
6
Introduction
E.g., WISC-V
7
8
9
10
11
12
13
14
15
http://www.apadivisions.org/division-
5/resources/doctoral.aspx
16
CTT
17
19
20
Scores
112 100.0
E.g., .815.0
z
X
M, SD
21
22
http://onlinestatbook.com/2/calculators/normal_dist.html
23
Grade M
2.0 15.0
3.0 25.0
4.0 30.0
24
Grade M Interpolation
2.0 15
2.1 16
2.2 17
2.3 18
2.4 19
2.5 20
2.6 21
2.7 22
2.8 23
2.9 24
3.0 25
25
Grade M Interpolation
3.0 25
3.1 25.5
3.2 26
3.3 26.5
3.4 27
3.5 27.5
3.6 28
3.7 28.5
3.8 29
3.9 29.5
4.0 30
26
RIQ = MA
100CA
E.g., CA = 10.0, MA = 12.5
RIQ = 12.5
100 12510.0
27
28
29
30
Example: z = .5
Type M SD
IQ (W) 100 15
IQ (SB) 100 16
Subtest 10 3
Type M SD
T 50 10
CEEB 500 100
Stanine 5 2
Sten 5.5 2
NCE 50 21.06
31
32
Case
VIQ = 115, 84.1 %tile
RC = 95, 36.9 %tile
SB5
Fluid
Knowledge
Quantitative
Visual-Spatial
Working Memory
Bender-Gestalt Visual-Motor Test
33
Reliability
34
35
X = t + e
t
36
X = t + e
e > 0, X > t
e < 0, X < t
e = 0, X = t
2 2 2s s sX t e
2
2
s
st
XX
X
r 2
2
s1
se
XX
X
r XX Xtr r
37
Test-retest rtt
Interrater (interscorer) r
38
Mulsant et al. (2002)
Reported (%)…
No. raters, 17; Training, 10; Interrater, 14–22; Drift, 5
Alternate-form r11
39
12-item test
1st half–2nd half
tot1/2 = ∑ (i1 – i6)
tot2/2 = ∑ (i7 – i12)
Odd–even
toto = ∑ (i1, i3, i5, i7, i9, i11)
tote = ∑ (i2, i4, i6, i8, i10, i12)
1 12!
4622 6! 6!
1 30!77,558,760
2 15! 15!
40
Split-half rhh
2 2(.70).82
1 1 .70hh
S B
hh
rr
r
41
α, rK-R20
0
ˆ 0
0
ijr
Reverse coding
0 = disagree, 1 = uncertain, 2 = agree
1. My general health is good.
2. I often feel unhealthy.
3. I worry little about my health.
42
0 = disagree, 1 = uncertain, 2 = agree
1. My general health is good.
3. I worry little about my health.
2 = disagree, 1 = uncertain, 0 = agree
2. I often feel unhealthy.
ijn r
2 2
21
t i
t
s sn
n s
2
20 21
t i i
K-R
t
s p qnr
n s
43
BDI-II Example
44
45
46
WAIS-IV Vocabulary
rtt = .89 1 − .89 = .11 α = .94 1 − .94 = .06
r = .95 1 − .95 = .05
∑ 1 − .22 = .78
S-B revisited
1 ( 1)XX
S-B
XX
k rr
k r
new
old
nk
n
47
E.g., n = 80, rXX = .75
Reduce to 5
k = 5/80 = .0625
.0625 (.75)
.161 (.0625 1) (.75)
S-Br
E.g., n = 10, rXX = .55
Double length
k = 20/10 = 2
2(.55)
.711 (2 1).55
S-Br
48
E.g., rXX = .35, rS-B = .90
Find k
(.35).90
1 ( 1).35
k
k
k = ? .90 .35 /[1 ( 1)(.35)]k k
.90[1 ( 1)(.35)] .35k k
.90[(1 .35 .35)] .35k k
.90 .315 .315 .35k k
.90 .315 .35 .315k k
.585 .035k .585 / .035 k
k = 16.71
49
+SDe
X = t + e
t
SEM
50
SEM 1t XX
SD r
If rXX = 1, then
SEM = 0
If rXX = 0, then
SEM = SDt
E.g., rXX = .80, SDt = 15.0
SEM 15.0 1 .80 6.71
E.g., X = 92
95% CI
X ± SEM (z.05)
z.05 = 1.96
92 ± 6.71 (1.96)
92 ± 13.15, [78.85, 105.15]
51
T′ = rXX (X – M) + M
E.g., X = 92,
rXX = .80, M = 100.0
T′ = .80 (92 – 100.0) + 100.0
= 93.6
93.6 ± 6.71 (1.96)
[80.45, 106.75]
52
Validity Kane (2013)
1. Context of use
2. Score interpretation
3. Evidence needed
Evidence
1. Test content
2. Internal structure
3. Covariance
4. Response process
53
Test specifications
Input (content)
Operations
Output
Number of items
Time limits
Difficulty
E.g., real estate law
54
E.g., Grade 6 biology
55
E.g., Revised SAT
https://collegereadiness.collegeboard.org/pdf/test-
specifications-redesigned-sat-1.pdf
56
57
58
E.g., KABC-I
8 subtests
2 factors
Sequential scale
Hand Movements, HM
Number Recall, NR
Word Order, WO
Simultaneous scale
Gestalt Closure, GC
Triangles, Tr
Spatial Memory, SM
Matrix Analogies, MA
Photo Series, PS
59
60
61
62
X 1
2
14 16 18 20 22
Y
8
0
90
10
0
110
12
0
13
05
63
ˆ 2.5 40.0Y X Given X = 17, no Y
ˆ 82.5Y
ˆ (1.96)est
Y SE
X
12 14 16 18 20 22
Y
8
0
90
10
0
110
12
0
13
05
64
21
est Y XYSE SD r
If rXY = 1, then
SEest = 0
If rXY = 0, then
SEest = SDY
E.g., rXY = .60, SDY = 7.5 ˆ 82.5Y
27.5 1 .60 6.0
estSE
95% CI
65
Y ± SEest (z.05)
82.5 ± 6.0 (1.96)
82.5 ± 11.76
[70.74, 94.26]
E.g., rXX = .70, rYY = .10
max | | .70 .10 .26XYr
−.26 ≤ rXY ≤ .26
66
Jingle
Same name, must be same
Jangle
Different name, must be different
X1, X2
Same trait, same method
r12 = .60…?
X1, X2
Same trait, Δ methods
r12 = .10
67
But … X1, X2
Same trait, “leadership”
Δ methods
r12 = .75
Both measure leadership (no)
X1, X2
Δ traits, same method
r12 = .60…?
X1, X2
Δ traits, Δ methods
r12 = .10
68
69
70
71
72
Items Rate your success in life
0 1 2 3 4 5 6 7 8 9 10
Not at all
successful
Extremely
successful
−5 −4 −3 −2 −1 0 1 2 3 4 5
Not at all
successful
Extremely
successful
73
Schwarz et al. (1991)
0 to 10
0 to 5: 30%
Unipolar (S degree)
−5 to 5
−5 to 0: 15%
Bipolar (F to S)
Which of the following describes you best?
a. I am outgoing
b. I work hard
74
Item statistics
Difficulty (p)
Discrimination (D)
Item-total, rit
2
is = p(1 – p)
max 2
is , p = .5
75
76
D = p (U) – p (L)
E.g., p (U) = .65, p (L) = .25
D = .65 − .25 = .40
Kelley (1939)
U: ↑ 27%
L: ↓ 27%
max 2
Ds
Item-total, rit
Keep rit > 0
Drop rit < 0
rit > 0
↑ ijr , ↑ α
rit < 0
↓ ijr , ↓ α, maybe < 0
77
78
79
80
81
82
83
84
85
86
ψ 315 Outcome
Test score OK Not
80–100% 88% 12%
40–79 77 23
0–39 55 45
87
Guessing
Difficulty
ICC
Discrimination
3.0 2.0 1.0 0 −1.0 −2.0 −3.0
Latent Ability (θ)
Pro
ba
bili
ty o
f C
orr
ec
t
Re
spo
nse
.9
.8
.7
.6
.5
.4
.3
.2
.1
0
1.0
88
Ability
89
Ontario
A learning disorder evident in both academic and social situations that involves one or
more of the processes necessary for the proper use of spoken language or the symbols of
communication, and that is characterized by a condition that
a. is not primarily the result of
impairment of vision;
impairment of hearing;
physical handicap;
mental retardation;
primary emotional disturbance; or
cultural difference; and
b. results in a significant discrepancy between academic achievement and assessed
intellectual ability, with defects in one or more of:
receptive language (i.e., listening, reading);
language processing (i.e., thinking, conceptualizing, integrating);
expressive language (i.e., talking, spelling, writing);
mathematical computations; and
90
c. may be associated with one or more conditions diagnosed as:
a perceptual handicap;
a brain injury;
minimal brain dysfunction;
dyslexia; or
developmental aphasia.
91
≥ 130 Very superior
120–129 Superior
110–119 High average
0–109 Average
80–89 Low average
70–79 Low (Borderline)
< 70 Very low
92
MI IQ
Visual-spatial Linguistic Logical-
mathematical
Bodily-kinesthetic Musical Interpersonal Intrapersonal Naturalistic
93
Got Got Got Got
Hero Hero
Aid Aid Aid Aid
× 3
94
WISC-V
6.0–16.11 yrs.
Full Scale (Short)
Primary Index (5)
Ancillary Index (5)
Complementary Index (3)
95
Full Scale (Short)
96
Primary (5)
Verbal Comprehension
Fluid Reasoning
Visual-Spatial
Working Memory
Processing Speed
97
Ancillary (5)
Quantitative Reasoning
Auditory Working Memory
Nonverbal
General Ability
Cognitive Proficiency
98
Complementary Index (3)
Naming Speed
Symbol Translation
Storage and Retrieval
99
Cormier et al. (2016)
N = 880 (2,200)
80, 11 age grps.
6-0 to 16-11 yrs.
English
QC, E areas only
WISC-VCDN-F
https://www.pearsonclinical.ca/content/dam/school
/global/clinical/canada/programs/WISC5-FR/WISC-
V-CDN-FR_FAQ.pdf
100
KABC-I, 1984, 2.5 to 12.5 yrs.
Sequential
Simultaneous
MPC
Achievement
101
Translation Ercikan et al. (2004)
DIF, 18–36% items
40%, translation
30%, curriculum
102
Bias https://news.slashdot.org/story/19/05/16/2138210/sat-to-add-adversity-score-that-rates-students-hardships
103
https://www.forbes.com/sites/michaeltnietzel/2019/05/16/four-reasons-why-the-college-boards-new-adversity-
score-is-a-bad-idea/#71abe8486c0e
104
Bias types
Content
Predictive
Construct
Language
Wording
Idioms, slang
105
Laundra & Sutton (2008)
1) Translate this phrase: “Jet to the Jects.”
a. Run home
b. Walk to the store
c. Go to the house of your significant other
d. Go to the projects
5) A “blunthead” is a:
a. Brother or male cousin
b. Person who is mentally ill
c. Pencil or pen
d. Person who smokes a lot of marijuana
6) What is “cakin’ it” ?
a. Arguing
b. Making cornbread
c. Being lovey-dovey with your boyfriend or girlfriend
d. Making pancakes
106
8) One of these things is not like the other. Which word is out of place?
a. Shawdy
b. Ma
c. Shorty
d. Boss
9) Who were the rappers involved in the first and most famous Rap rival?
a. Jay-Z and Nas
b. Ja Rule and DMX
c. Biggie and Tupac
d. Eminem and Benzino
10) What is “gwap”?
a. A term used to refer to money
b. A term used to refer to male genitalia
c. A term used to refer to nice shoes
d. Another name for a college or university
13) Being “boo’d up” means that you are:
a. Cool
b. Spending time with your boyfriend or girlfriend
c. Constipated
d. Being ridiculed in public
107
108
Ethics Kaufman (1990)
Agree with model (%)
Experienced, 32
Inexperienced, 35
Range for IQ = 110
Experienced, 107–115
Inexperienced, 108–117
109
K-WIIS report sections
1. Referral & background
2. Physical observations
3. Test behaviors
4. WISC-III scaled scores
5. WISC-III IQ, Index scores
6. IQ-ACH Δs
Observed Δ
IQ – YACH
Predicted Δ
YACH, rYY
YACH - ACH
Y
110
Example of a K-WIIS Report
WISC-III Interpretive Report
Name: Tony Date of Evaluation: [omitted]
Date of Birth: [omitted] Grade: First
Chronological Age: 7 yrs. 6 mos. Examiner: Kline
Referral Information
Tony was referred for evaluation by his teacher because of learning problems in
school and attention and concentration difficulties. The main goals of this evaluation
were to answer the following questions: Is Tony in an appropriate classroom setting? Are
special education services recommended for Tony? Should Tony be monitored for future
developments?
Tony is a Caucasian male, age 7 years and 6 months. The background information
presented here about Tony is primarily based on reports from his mother and also from his
teacher. Tony's parents immigrated to this country. His parents are bilingual and
Italian is spoken in Tony's home. He lives with his biological parents and is the only
child in his residence. His family economic status is working class. As a child, his
home environment was average, that is, neither impoverished nor enriched. Cultural
opportunities at home (e.g., availability of books, family trips to museums) are average,
neither inadequate nor excellent. Both Tony's mother and his father graduated from high
school.
111
WISC-III Psychometric Summary IQ 90% Factor Index 90% IQs Score %ile C.I. Indexes Score %ile C.I. --------------------------------- ---------------------------------- VIQ 88 21 83- 94 VC 92 30 87- 98 PIQ 121 92 112-126 PO 130 98 120-134 FSIQ 104 61 99-109 FD 72 3 68- 83 PS 83 13 77- 94 Scaled Scaled Verbal Score %ile Performance Score %ile --------------------------------- ---------------------------------- Information 9 37 Picture Completion 11 63 Similarities 8 25 Coding 7-W 16 Arithmetic 5 5 Picture Arrangement 18-S >99 Vocabulary 7 16 Block Design 13 84 Comprehension 10 50 Object Assembly 17-S 99 (Digit Span) 5 5 (Symbol Search) 6-W 9 --------------------------------- ----------------------------------
Tony was given the WISC-III, a test that evaluates the present level of intellectual functioning of children and adolescents. He scored in the Average range of intelligence, earning a Full Scale IQ of 104. Tony's overall performance on the WISC-III ranks him at the 61st percentile relative to other 7-year-olds. The chances are very good (about 19 out of 20) that Tony's true Full Scale IQ is likely to fall between 99 and 109. For Tony, however, the Full Scale IQ is not meaningful because he displayed a striking discrepancy between his verbal and nonverbal intelligence.
Tony's Performance IQ of 121 is significantly and strikingly higher than his Verbal
IQ of 88. His High Average to Superior PIQ (92nd percentile), when compared to his Low Average to Average VIQ (21st percentile), suggests that his intelligence on these two scales is inconsistent.
Tony's strikingly lower verbal abilities may be related to his referral for a
possible learning problem, bilingual parents, reported weakness in vocabulary and verbal expression, and delayed social development.
112
Recommendations Tony has been referred for school learning problems and may require remediation. If so, then the following suggestions may prove beneficial for Tony: A. Individualize each area of instruction so that Tony is taught at the appropriate readiness level for each different skill. B. Teach to Tony's tolerance level and avoid pushing beyond. (For example, help teacher pinpoint his threshold level and stay at it.) C. Begin new tasks only when you know Tony is not tired and is "ready" to learn. Tony had a weakness in auditory and visual short term memory. To help Tony with his memory problem: A. Employ distributed review: space out the demands for practice of new skills. This will avoid boredom and fatigue, and provide for overlearning and the development of automatic skills. (This is especially useful for math difficulties and word recognition skills.) Suggest ten 3 minute sessions if a half hour's work is necessary. B. Provide Tony with a written list of reminders. Tony can check each one that is completed off the list as he goes along. C. Follow a predictable routine, so Tony won't have to learn new formats for completing work successfully. This will help preset expectations and reduce memory load. D. Don't accept the excuse of "I forgot" to allow Tony to avoid assigned homework or chores. Have him complete the assignment when reminded.
114
MMPI-2
L, F, K
Hypochondriasis, Hs
Depression, D
Hysteria, Hy
Psychopath. Deviate, Pd
Masc./Fem., Mf
Paranoia, Pa
Psychasthenia, Pt
Schizophrenia, Sc
Hypomania, Ma
Social Introversion, Si
115
116
Assessment in Education
Practical Assessment, Research & Evaluation
Assessment & Evaluation in Higher Education
Educational Assessment
Educational Assessment, Evaluation and Accountability
Educational Evaluation and Policy Analysis
Educational Measurement: Issues and Practice
Journal of Educational Measurement
International Journal of Educational and Psychological
Assessment
Educational and Psychological Measurement
Applied Psychological Measurement
Psychometrika
Journal of Technology, Learning and Assessment
Assessing Writing
Language Testing
Language Assessment Quarterly