~ Test Construction and Validation ~ Fundamental Points and Practices Stephen J. Vodanovich, Ph.D

~ Test Construction and Validation ~

Fundamental Points and Practices

Stephen J. Vodanovich, Ph.D.

~ Identifying The Item Domain ~

[a.k.a. Where do the questions come from?]

Item Domain

Test

• Specific, defined content area (e.g., course exam, training program)• Expert opinion, observation (e.g., professional literature)• Job analysis (identification of major job tasks, duties)

Job Analysis Overview

Job (or Job

Category)

Task 1

Task 2

Task 3 Task 4

Task Identification KSA Identification

KSA 1

KSA 2

KSA 3 KSA 4

• Rate Tasks and KSAs• Connect KSAs to Tasks

Frequency of use

5 = almost all of the time4 = frequently3 = occasionally2 = seldom1 = not performed at all

Importance of performing successfully

5 = extremely important4 = very important3 = moderately important2 = slightly important1 = of no importance

Importance for new hire


Distinguishes between superior & ad performance

5 = a great deal4 = considerably3 = moderately2 = slightly1 = not at all

Damage if error occurs

5 = extreme damage4 = considerable damage3 = moderate damage2 = very little damage1 = virtually no damage

1

2

3

4

5

6

7

~ Sample Task Rating Form ~

Importance for acceptable job performance


Importance for new hire


Distinguishes between superior & adequate performance

5 = a great deal4 = considerably3 = moderately2 = slightly1 = not at all

A

B

C

D

E

F

G

~ Sample KSA Rating Form ~

KSA A B C D E F G H

Job Tasks

1

2

3

4

5

6

7

Sample Task -- KSA MatrixTo what extent is each KSA needed when performing each job task?

5 = Extremely necessary, the job task cannot be performed without the KSA4 = Very necessary, the KSA is very helpful when performing the job task3 = Moderately necessary, the KSA is moderately helpful when performing the job task2 = Slightly necessary, the KSA is slightly helpful when performing the job task1 = Not necessary, the KSA is not used when performing the job task

~ Writing Test Items ~

• Write a lot of questions

• Write more questions for the most critical KSAs

• Consider the reading level of the test takers

~ Selecting Test Items ~

• Initial review by Subject Matter Experts (SMEs)

• Connect items to KSAs

• Assess difficulty of items relative to job requirements

• Suggest revisions to items and answers

Sample Item Rating Form

Connect each item to

a KSA or two

Rate difficulty of

each item (5-point scale) relative to the level of KSA needed in the job)

~ Statistical Properties of Items ~

• Item Difficulty levels. Goal is to keep items of moderate difficulty (e.g., p values between .40 - .60)

-4 -3 -2 -1 Mean +1 +2 +3 +4

10

“p-value” is % of people getting each item correct

R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L L)

Mean Std Dev Cases

Q1 .7167 .4525 120.0 Q2 .7583 .4299 120.0 Q3 .8167 .3886 120.0 Q4 .9333 .2505 120.0 Q5 .9583 .2007 120.0 Q6 .9000 .3013 120.0 Q7 .6333 .4839 120.0 Q8 .8750 .3321 120.0 Q9 .8000 .4017 120.0 Q10 .6167 .4882 120.0 Q11 .9750 .1568 120.0 Q12 .8083 .3953 120.0 Q13 .7583 .4299 120.0 Q14 .5083 .5020 120.0

Answers are scored as correct “1” or wrong “0.” So, the mean

is the p value of the items (difficulty level or % or people

getting each item correct)

Acceptable items

Easy items

~ Statistical Properties of Items (cont.) ~

• Item correlations with each other. Goal is to select items that relate moderately to each other or “hang together” reasonably well (e.g., item x total score correlations of between .40 - .60, “alpha if item deleted” information)

Internal Consistency

Q1 43.3750 67.0599 .2285 .8356Q2 43.3333 67.7031 .1513 .8370Q3 43.2750 66.5708 .3527 .8335Q4 43.1583 67.7814 .2700 .8354Q5 43.1333 68.6711 .0741 .8374Q6 43.1917 68.8117 .0111 .8385Q7 43.4583 65.8302 .3685 .8327Q8 43.2167 67.0283 .3346 .8341Q9 43.2917 65.9562 .4353 .8319Q10 43.4750 67.4952 .1526 .8373Q11 43.1167 68.8938 .0152 .8378Q12 43.2833 67.9022 .1381 .8371Q13 43.3333 65.9216 .4085 .8322Q14 43.5833 65.2871 .4214 .8315

Scale mean if item deleted

Scale variance if item deleted

Corrected item-total correlation

Alpha if item deleted

Alpha = .8374

~ Item-Total Statistics ~

Kirkland v. Department of Correctional Services (1974)

"Without such an analysis (job analysis) to single out the critical knowledge, skills and abilities required by the job, their importance relative importance to each other, and the level of proficiency demanded as to each attribute, a test constructor is aiming in the dark and can only hope to achieve job relatedness by blind luck”

~ Legal Concerns ~

• The KSAs tested for must be critical to successful job performance

• Portions of the exam should be accurately weighted to reflect the relative importance to the job of the attributes for which they test

• The level of difficulty of the exam material should match the level of difficulty of the job

Construct ValidationMethod 1

(Paper & Pencil)Method 2 (Clinical

Interview)

Method 3 (Peer

observation)

Traits

A B C

A B C

A B C

Method 1(Paper & Pencil)

A

B

C

Mono Method

Method 2 (Clinical

Interview)

A

B

C

Hetero Method

Mono Method

Method 3 (Peer

observation)

A

B

C

Hetero Method

Hetero Method

Mono Method


Method 2 (Clinical

Interview)

Method 3 (Peer

observation)

Traits

A B C

A B C

A B C


A (Boredom)

B (Dep)

C (Anxiety)

Method 2 (Clinical

Interview)

A (Boredom)

B (Dep)

C (Anxiety)

Method 3 (Peer

observation)

A (Boredom)

B (Dep)

C (Anxiety)

.89

.91

.87

.92

.93

.82

.90

.93

.87

.55

.46

.53

.61

.54

.66

.55

.46

.53

Reliability Figures

Mono-Trait; Hetero-Method

.49

.33 .36

.54

.62 .55

.49

.54 .52

Hetero-Trait; Mono Method

.20

.20

.15 .15

.08

.12

.20

.21

.15 .15

.15

.13

.35 .41

.40 .37

.31 .32

Hetero-Trait; Hetero-Method

Documents

~ Test Construction and Validation ~ Fundamental Points and Practices Stephen J. Vodanovich, Ph.D