Upload
takakumazawa
View
99
Download
1
Tags:
Embed Size (px)
DESCRIPTION
I evaluated validity of criterion-referenced placement test score interpretations and uses using Kane’s (2006) argument-based validity framework
Citation preview
Evaluating validity of criterion-
referenced test score
interpretations and usesTakaaki Kumazawa
Kanto Gakuin University
Kintai Bridge, Japan (wiki)
Purpose
ß The purpose of my talk is to evaluate
validity of criterion-referenced placement
test score interpretations and uses using
Kane’s (2006) argument-based validity
framework
ß This presentation is based on a paper I
published in the JALT Journal
(http://jalt-publications.org/jj/issues/2013-05_35.1)
Classical view of validity
ß Validity: the extent to which a test is supposed to measure
ß Three types of validity
Þ Criterion-related validityCorrelation between a valid measure and a test developing
Þ Content validityExperts’ judgment on whether items are measuring what is supposed to measure
Þ Construct validityStatistical examination on whether items are measuring what is supposed to measure
Current view of Validity
ß Validity is “the degree to which evidence
and theory support the interpretations of
test scores entailed by proposed uses of
tests” (American Educational Research
Association, American Psychological
Association, & National Council on
Measurement in Education [AERA, APA, &
NCME], 1999, p. 9).
Argument-based validity framework
Interpretive argument: proving argument that the inferences are
going to make is theoretically valid
Validity argument: evaluating the interpretive argument by providing
warrant
Observatio
n
Observed
score
Universe
score
Target
scoreUse
Scoring generalization extrapolation
decision
Interpretive argument
ß Scoring inferenceÞ to what extent do examinees get placement items correct
and high-scoring examinees get more placement items correct
ß Generalization inference Þ to what extent are placement items consistently sampled
from a domain and sufficient in number so as to reduce the measurement error
ß Extrapolation inferenceÞ to what extent do the difficulty of placement items match to
the objectives of a reading course
ß Decision inferenceÞ to what extent do placement decisions made to place
examinees in their proper level of the course have an impact on washback in the course
Participants
Þ 428 Japanese 1st year university students majoring in law
Þ TOEIC score of about 250-450
Þ Three courses in the English program Reading
Listening
TOEIC skills
ß Proficiency based programÞ Three levels
Level 1: 60 high scoring studentsMajor objective of the reading course: improve their reading skills such as fast reading
Level 2: about 300 students
Level 3: 50 low scoring studentsMajor objective of the reading class: re-learn Jr High and High school grammar
Criterion-referenced placement test
ß Grammar (k = 40)
Þ Items are taken from textbooks used in junior and high schools
Þ Grammar: present, past, & future tenses, continuous, relative pronoun,
gerund, participle, etc…
Þ Sample: Hi, I ( ) Ken.
1. am 2. are 3. is 4. be
ß Vocabulary (k = 40)
Þ Items are taken from high frequent 1000-3000 words based on the
JACET 8000 corpus
Þ Sample: Bring
1. 送る (send) 2. 持ってくる (bring) 3. 鳴る (ring) 4. 購入する (buy)
ß Reading (k = 10)
Þ Two passages are taken from two textbooks used in Level 1 and Level
3 reading classes
Þ Sample: How do they travel?
1. by plane 2. by bus 3. by car 4. by train
Procedures
ß On the first day of semester, the placement test was given in 45 minutes
ß A grammar pretest (k = 55, α = .85) was given on the first day of class in Level 2 classes (n = 51) and Level 3 classes (n = 49)
ß 30 90-minute lessons in two semesters
ß The same grammar posttest (α = .92) was given on the last day of class to the same students (n = 51, 49)
ß A course evaluation survey was given to the same students (n = 51, 49)
Backing for scoring inference
ß Item facilityÞ 7 items below .29
Þ 62 items between .30 and .70
Þ 21 items above .71
ß Item discriminationÞ 4 items below .19
Þ 86 items above .20
ß Rasch Item difficulty estimatesÞ -3.79〜2.33
ß Infit MSÞ 0.80〜1.30
Backing for generalization inference
ß Multivariate generalizability theory
(Decision study of a persons X Items
design)
Þ Grammar (k = 40, ρ = .85, Φ = .83)
Þ Vocabulary (k = 40, ρ = .86, Φ = .84)
Þ Reading (k = 10, ρ = .58, Φ = .55)
Þ Total (k = 90, ρ = .92, Φ = .91)
Cut point for Level 1
Level 1 reading
Cut point for Level 3
Junior High grammar and 1000 word level
Backing for extrapolation inferenceDifficulty level estimates FACETS map
Level Difficulty SE Infit MS
Junior High grammar -0.65 0.03 1.00
High School grammar 0.29 0.02 1.00
1000 word level vocab -0.94 0.03 1.00
2000 word level vocab 0.15 0.03 1.00
3000 word level vocab 0.12 0.05 1.00
Level 3 rearing 0.30 0.05 1.00
Level 1 reading 0.73 0.05 1.10
-----------------------------------------------------
|Measr|+students
|-items | -levels
| CUT Po int for Leve ls 1, 2,
3
-----------------------------------------------------
+
3
+
+
+
+
|
|
.
| |
|
|
|
.
|
|
|
|
|
.
|
|
|
|
|
.
|
|
|
|
|
.
|
*
|
|
|
|
*
.
|
|
|
+
2
+
.
+
*
+
+
|
|
.
|
|
|
|
|
*
*
.
|
|
|
|
|
*
.
|
*
|
| Level 1a ( 1.49)
---------------------------------------------------------------------------
|
|
*
*
**.
|
|
|
|
|
*
*
**.
|
|
|
|
|
*
*
*.
|
*
|
|
+
1
+
*
**.
+
***
**
+
+
|
|
*
*
****
.
|
*
**
**
*
**
|
|
|
|
*
*
*.
|
***
|
Lev
el
1
Rea
d
ing
| L
e
vel 1b
(.77 )
---------------------------------------------------------------------------
|
|
*
*
****
.
|
*
****
*
|
|
|
|
*
*
**
|
****
**
|
|
|
|
*
*
****
*
. |
**
**
*
***
| Basic
H
S
G r a m
m a r |
|
|
*
*
**
|
****
****
|
JACET2000
J
ACET3000 |
*
0
*
*
****
*
*. *
***
*
** *
*
L e
v e l
2
( .
7 7-.70)
|
|
*
*
****
*
|
*
**
|
|
|
|
*
*
**.
|
***
***
|
|
|
|
*
*
****
.
|
*
***
|
|
|
|
*
*
****
*
** | ***
*
**
|
|
----------------------------------------------------------------------------
|
|
*
*
*
|
*
****
| Jr
H
Gram
m
a
r
| L
e
vel 3a
( -.70)
|
|
*
*
*.
|
**
|
|
----------------------------------------------------------------------------
+
-1 +
**
*
*.
+
**
+
J
AC
ET1
000
+ L e v el
3b
(
-.99)
|
|
*
*
.
|
*
*
|
|
|
|
.
|
*
|
|
|
|
.
|
|
|
|
|
.
|
|
|
|
|
|
*
|
|
|
|
.
|
*
|
|
+
-2 +
+
*
+
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
*
|
|
|
|
|
| |
+
-3 +
+
+
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
*
|
|
+
-4 +
+
+
+
-----------------------------------------------------
|Measr| *
=
4
|
*
=
1
| -levels
|
-----------------------------------------------------
Backing for decision inferenceLevel 2 and Level 3 students’ (n = 51, 49) grammar pretest and posttest
scores (k = 55)
11 points down
6 points up
Level 2
students
scored
higher
Level 3
students
scored
higher
Grammar pretest(α=.85) Grammar posttest(α=.92) Class Level n M SD n M SD Level 2a 26 30.38 6.34 21 12.14 2.50 Level 2b 25 32.36 8.47 24 28.63 7.93
Level 2 51 31.35 7.45 45 20.93 10.24 Level 3c 25 20.80 5.09 22 26.82 5.21 Level 3d 24 19.88 4.29 23 26.78 5.95 Level 3 49 20.35 4.69 45 26.80 5.53
Validity argumentInterpretive argumentß Scoring inference
Þ to what extent do examinees get placement items correct and high-scoring examinees get more placement items correct
ß Generalization inference
Þ to what extent are placement items consistently sampled from a domain and sufficient in number so as to reduce the measurement error
ß Extrapolation inference
Þ to what extent do the difficulty of placement items match to the objectives of a reading course
ß Decision inference
Þ to what extent do placement decisions made to place examinees in their proper level of the course have an impact on washback in the course
Validity argumentß Scoring inference
Þ Because most items were working well,
the inference from observation to the
observed score was valid
ß Generalization inference
Þ Because of high dependability with the
small amount of measurement error, the
inference from the observed score to
universe score was valid
ß Extrapolation inference
Þ Because the difficulty of the items were
adequate to the objectives of the program,
the inference from the universe score to
target score was valid
ß Decision inference
Þ Because Level 3 students were placed in
the right level and were able to improve
their grammar test scores, the inference
from the target score to test use was valid.
Conclusionß “Validation is simple in principle, but
difficult in practice. The argument-based
framework provides a relatively pragmatic
approach to validation” (Kane, 2012, p. 15).
William Jolly Bridge, Brisbane
(wiki)
References
ß Kane, M. (2006). Validation. In R. Brennan
(Ed.), Educational measurement (4th ed.). (pp.
17-64). Westport, CT: Greenwood Publishing.
ß Kane, M. (2012). Validating score
interpretations and uses. Language Testing,
29, 3-17. doi: 10.1177/0265532211417210
ß Kumazawa, T. (2013). Evaluating validity for
in-house placement test score interpretations
and uses. JALT Journal, 35, 73-100.