Upload
avice-hunt
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
~ Test Construction and Validation ~
Fundamental Points and Practices
Stephen J. Vodanovich, Ph.D.
~ Identifying The Item Domain ~
[a.k.a. Where do the questions come from?]
Item Domain
Test
• Specific, defined content area (e.g., course exam, training program)• Expert opinion, observation (e.g., professional literature)• Job analysis (identification of major job tasks, duties)
Job Analysis Overview
Job (or Job
Category)
Task 1
Task 2
Task 3 Task 4
Task Identification KSA Identification
KSA 1
KSA 2
KSA 3 KSA 4
• Rate Tasks and KSAs• Connect KSAs to Tasks
Frequency of use
5 = almost all of the time4 = frequently3 = occasionally2 = seldom1 = not performed at all
Importance of performing successfully
5 = extremely important4 = very important3 = moderately important2 = slightly important1 = of no importance
Importance for new hire
5 = extremely important4 = very important3 = moderately important2 = slightly important1 = of no importance
Distinguishes between superior & ad performance
5 = a great deal4 = considerably3 = moderately2 = slightly1 = not at all
Damage if error occurs
5 = extreme damage4 = considerable damage3 = moderate damage2 = very little damage1 = virtually no damage
1
2
3
4
5
6
7
~ Sample Task Rating Form ~
Importance for acceptable job performance
5 = extremely important4 = very important3 = moderately important2 = slightly important1 = of no importance
Importance for new hire
5 = extremely important4 = very important3 = moderately important2 = slightly important1 = of no importance
Distinguishes between superior & adequate performance
5 = a great deal4 = considerably3 = moderately2 = slightly1 = not at all
A
B
C
D
E
F
G
~ Sample KSA Rating Form ~
KSA A B C D E F G H
Job Tasks
1
2
3
4
5
6
7
Sample Task -- KSA MatrixTo what extent is each KSA needed when performing each job task?
5 = Extremely necessary, the job task cannot be performed without the KSA4 = Very necessary, the KSA is very helpful when performing the job task3 = Moderately necessary, the KSA is moderately helpful when performing the job task2 = Slightly necessary, the KSA is slightly helpful when performing the job task1 = Not necessary, the KSA is not used when performing the job task
~ Writing Test Items ~
• Write a lot of questions
• Write more questions for the most critical KSAs
• Consider the reading level of the test takers
~ Selecting Test Items ~
• Initial review by Subject Matter Experts (SMEs)
• Connect items to KSAs
• Assess difficulty of items relative to job requirements
• Suggest revisions to items and answers
Sample Item Rating Form
Connect each item to
a KSA or two
Rate difficulty of
each item (5-point scale) relative to the level of KSA needed in the job)
~ Statistical Properties of Items ~
• Item Difficulty levels. Goal is to keep items of moderate difficulty (e.g., p values between .40 - .60)
-4 -3 -2 -1 Mean +1 +2 +3 +4
10
“p-value” is % of people getting each item correct
R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L L)
Mean Std Dev Cases
Q1 .7167 .4525 120.0 Q2 .7583 .4299 120.0 Q3 .8167 .3886 120.0 Q4 .9333 .2505 120.0 Q5 .9583 .2007 120.0 Q6 .9000 .3013 120.0 Q7 .6333 .4839 120.0 Q8 .8750 .3321 120.0 Q9 .8000 .4017 120.0 Q10 .6167 .4882 120.0 Q11 .9750 .1568 120.0 Q12 .8083 .3953 120.0 Q13 .7583 .4299 120.0 Q14 .5083 .5020 120.0
Answers are scored as correct “1” or wrong “0.” So, the mean
is the p value of the items (difficulty level or % or people
getting each item correct)
Acceptable items
Easy items
~ Statistical Properties of Items (cont.) ~
• Item correlations with each other. Goal is to select items that relate moderately to each other or “hang together” reasonably well (e.g., item x total score correlations of between .40 - .60, “alpha if item deleted” information)
Internal Consistency
Q1 43.3750 67.0599 .2285 .8356Q2 43.3333 67.7031 .1513 .8370Q3 43.2750 66.5708 .3527 .8335Q4 43.1583 67.7814 .2700 .8354Q5 43.1333 68.6711 .0741 .8374Q6 43.1917 68.8117 .0111 .8385Q7 43.4583 65.8302 .3685 .8327Q8 43.2167 67.0283 .3346 .8341Q9 43.2917 65.9562 .4353 .8319Q10 43.4750 67.4952 .1526 .8373Q11 43.1167 68.8938 .0152 .8378Q12 43.2833 67.9022 .1381 .8371Q13 43.3333 65.9216 .4085 .8322Q14 43.5833 65.2871 .4214 .8315
Scale mean if item deleted
Scale variance if item deleted
Corrected item-total correlation
Alpha if item deleted
Alpha = .8374
~ Item-Total Statistics ~
Kirkland v. Department of Correctional Services (1974)
"Without such an analysis (job analysis) to single out the critical knowledge, skills and abilities required by the job, their importance relative importance to each other, and the level of proficiency demanded as to each attribute, a test constructor is aiming in the dark and can only hope to achieve job relatedness by blind luck”
~ Legal Concerns ~
• The KSAs tested for must be critical to successful job performance
• Portions of the exam should be accurately weighted to reflect the relative importance to the job of the attributes for which they test
• The level of difficulty of the exam material should match the level of difficulty of the job
Construct ValidationMethod 1
(Paper & Pencil)Method 2 (Clinical
Interview)
Method 3 (Peer
observation)
Traits
A B C
A B C
A B C
Method 1(Paper & Pencil)
A
B
C
Mono Method
Method 2 (Clinical
Interview)
A
B
C
Hetero Method
Mono Method
Method 3 (Peer
observation)
A
B
C
Hetero Method
Hetero Method
Mono Method
Method 1(Paper & Pencil)
Method 2 (Clinical
Interview)
Method 3 (Peer
observation)
Traits
A B C
A B C
A B C
Method 1(Paper & Pencil)
A (Boredom)
B (Dep)
C (Anxiety)
Method 2 (Clinical
Interview)
A (Boredom)
B (Dep)
C (Anxiety)
Method 3 (Peer
observation)
A (Boredom)
B (Dep)
C (Anxiety)
.89
.91
.87
.92
.93
.82
.90
.93
.87
.55
.46
.53
.61
.54
.66
.55
.46
.53
Reliability Figures
Mono-Trait; Hetero-Method
.49
.33 .36
.54
.62 .55
.49
.54 .52
Hetero-Trait; Mono Method
.20
.20
.15 .15
.08
.12
.20
.21
.15 .15
.15
.13
.35 .41
.40 .37
.31 .32
Hetero-Trait; Hetero-Method