Upload
grace-hart
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
ACCESS for ELLs®Scores, Reliability and Validity
Developed by the Center for Applied Linguistics
Prepared by Dorry Kenyon, CAL
ISBE Meeting, Chicago, IL
February 21, 2007
2ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Outline of my presentationOutline of my presentation
1. What do scores on ACCESS for ELLs® mean?
2. What do we know about the reliability of ACCESS for ELLs® scores?
3. What do we know about the validity of ACCESS for ELLs® scores?
4. So what does this mean for using scores on ACCESS for ELLs®?
3ISBE Presentation 2/21/2007© 2007 WIDA/CAL
1. What do scores on ACCESS for ELLs® mean?1. What do scores on ACCESS for ELLs® mean?
4ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Two types of scoresTwo types of scores
WIDA ACCESS for ELLs® Scale Scores = psychometrically-derived measure
WIDA ACCESS for ELLs® Proficiency Level Scores = socially-derived interpretation of the scale score in terms of the WIDA Standards’ Proficiency Level Definitions
5ISBE Presentation 2/21/2007© 2007 WIDA/CAL
What is measured?What is measured?
Scale Scores (and interpretive Proficiency Level Scores) are given for measures in the four domains Listening Speaking Reading Writing
Scale Scores are combined into four composite scores (which are also interpreted in Proficiency Level Scores) Oral (listening and speaking) Literacy (reading and writing) Comprehension (listening and reading) Overall Composite (listening, speaking, reading, and
writing)
6ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Weighting of the overall compositeWeighting of the overall composite
Scale Scales of the four domains are weighted differently in the overall composite score Listening (15%) Speaking (15%) Reading (35%) Writing (35%)
7ISBE Presentation 2/21/2007© 2007 WIDA/CAL
ACCESS administration times and composite score weightsACCESS administration times and composite score weights
Listening (15%): 20-25 minutes, machine scored
Reading (35%): 35-40 minutes, machine scored
Writing (35%): Up to 1 hour, rater scored
Speaking (15%): Up to 15 minutes, administrator scored
Test Times(Minutes)Listening,
25
Reading, 40Writing, 60
Speaking, 15
Listening
Reading
Writing
Speaking
Test Weights(Percent)Listening,
15%
Reading, 35%
Writing, 35%
Speaking, 15%
Listening
Reading
Writing
Speaking
8ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Scale Scores vs. Proficiency Level ScoresScale Scores vs. Proficiency Level Scores
The WIDA ACCESS for ELLs® Scale Scores are the psychometrically derived measures of student proficiency Range from 100 to 600 One scale applies to all grades through vertical equating
of tests Vertical scale score takes into account that assessment
tasks taken by students in the grade 9-12 cluster are more challenging than the assessment tasks taken by students in the grade 1-2 cluster
Average scale scores consistently show an increase from grade to grade
10ISBE Presentation 2/21/2007© 2007 WIDA/CAL
2005-2006 Overall Composite Scale Scores2005-2006 Overall Composite Scale Scores
Average Overall Composite Scale Score by Grade
250
300
350
400
1 2 3 4 5 6 7 8 9 101112
Grade
Sca
le S
core
Average OverallCompositeScale Score
11ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Scale Scores vs. Proficiency Level ScoresScale Scores vs. Proficiency Level Scores
Proficiency Level Scores are socially-derived interpretations of the WIDA ACCESS for ELLs® Scale Scores in terms of the six proficiency levels defined in the WIDA Standards Comprised of two numbers, e.g. 2.5
First number indicates the proficiency level into which the student’s scale score places him or her (e.g. 2 = Beginning)
Second number indicates how far, in tenths, the student’s scale places him or her between the lower and the higher cut score of the proficiency level (e.g. 2.5 = 5/10 or ½ of the way between the cut score for level 2 and for level 3)
The same scale score is interpreted differently based on what grade level cluster different students are in
The same proficiency level score corresponds to different scale scores based on the grade level cluster
12ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Example: Scale score of 350Example: Scale score of 350
Grades Domain Cut
1/2 2/3 3/4 4/5 5/6
1-2 Overall 259 285 313 332 354
3-5 Overall 292 325 350 370 394
6-8 Overall 319 347 374 393 410
9-12 Overall 347 373 396 412 429350
350
350
350
13ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Easy ItemsLess Proficient Students
Hard ItemsMore Proficient Students
350 475 600225100
9-12
429
410
6-8
394
3-5
Example: Overall composite proficiency level score 6.0Example: Overall composite proficiency level score 6.0
354
1-2
14ISBE Presentation 2/21/2007© 2007 WIDA/CAL
How are proficiency level scores derived?How are proficiency level scores derived?
While Proficiency Level Scores are socially-derived interpretations, they are not arbitrary Set by panels of content experts Set following best technical practices Set by consensus building procedures (standard setting
studies) Set by carefully documented replicable procedures
For WIDA ACCESS for ELLs®, these were set by panels of experts in April of 2004, for each grade level cluster (see WIDA Technical Report #1 for complete details)
15ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Originally WIDA had grade level cluster cutsOriginally WIDA had grade level cluster cutsOverall Composite (Current Cuts)
220
230
240
250
260
270
280
290
300
310
320
330
340
350
360
370
380
390
400
410
420
430
440
450
0 1 2 3 4 5 6 7 8 9 10 11 12
Grade
Sca
le S
core current1/2
current2/3
current3/4
current4/5
current5/6
1
2
3
4
5
6
16ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Grade level cuts are being introduced this yearGrade level cuts are being introduced this year
Overall Composite (Proposed Smoothed Cuts)
220
230
240
250
260
270
280
290
300
310
320
330
340
350
360
370
380
390
400
410
420
430
440
450
0 1 2 3 4 5 6 7 8 9 10 11 12
Grade
Sca
le S
core proposed1/2
proposed2/3
proposed3/4
proposed4/5
proposed5/6
1
2
34
5
6
17ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Cluster vs. grade level cutsCluster vs. grade level cuts
Overall Composite (Proposed vs. Current Cuts)
220
230
240
250
260
270
280
290
300
310
320
330
340
350
360
370
380
390
400
410
420
430
440
450
0 1 2 3 4 5 6 7 8 9 10 11 12
Grade
Sca
le S
core
current1/2
current2/3
current3/4
current4/5
current5/6
proposed1/2
proposed2/3
proposed3/4
proposed4/5
proposed5/6
18ISBE Presentation 2/21/2007© 2007 WIDA/CAL
2005-2006 Overall Composite Scale Scores2005-2006 Overall Composite Scale Scores
Average Overall Composite Scale Score by Grade
250
300
350
400
1 2 3 4 5 6 7 8 9 101112
Grade
Sca
le S
core
Average OverallCompositeScale Score
19ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Effect of grade level cut scoresEffect of grade level cut scores
Prof Level Score of Average Overall Composite Scale Score by Grade
1.0
2.0
3.0
4.0
5.0
6.0
1 2 3 4 5 6 7 8 9 101112
Grade
Pro
fien
cy L
evel
S
core
Current ClusterCuts
ProposedGrade LevelCutsP
rofi
cien
cy L
evel
S
core
20ISBE Presentation 2/21/2007© 2007 WIDA/CAL
2. What do we know about the reliability of ACCESS for ELLs® scores?
2. What do we know about the reliability of ACCESS for ELLs® scores?
21ISBE Presentation 2/21/2007© 2007 WIDA/CAL
What is reliability? What is reliability?
Psychometrically speaking, reliability refers to the consistency of test scores.
What evidence is there that this test score result is not just a chance occurrence, but would have been obtained had the student been tested on multiple occurrences or scored under multiple occasions?
22ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Multiple forms of ACCESS for ELLs®Multiple forms of ACCESS for ELLs®
In the Annual Technical Report, the reliability of each of the 44 separate test forms for ACCESS for ELLs® is reported.
Cluster List Read Write Speak Total
K 1 1 1 1 4
1-2 3 3 3 1 10
3-5 3 3 3 1 10
6-8 3 3 3 1 10
9-12 3 3 3 1 10
Total 13 13 13 5 44
23ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Types of reliability reportedTypes of reliability reported
For all test forms, internal consistency (coefficient alpha) is reported.
For writing, agreement between operational raters is also reported (20%)
For speaking, agreement between administrators from field test data is also given currently, but a larger study is underway
Reliabilities for domain scores based on the individual forms for Series 100 (2004-2005) are within expected and acceptable ranges
24ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Reliability of the overall compositeReliability of the overall composite
Results indicate that the reliability of the overall composite score across tiers is similar and very high across all grade level clusters (Series 100).
K .930
1-2 .949
3-5 .941
6-8 .933
9-12 .936
25ISBE Presentation 2/21/2007© 2007 WIDA/CAL
The most important reliability indexThe most important reliability index
For tests like ACCESS for ELLs®, by which decisions are based on a student’s classification into proficiency levels, the accuracy of classification is perhaps the most important reliability index.
This index gives an estimate of how reliably a student was placed to be at least at or above a certain category (versus below that category).
26ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Accuracy of classification indices (Series 100)Accuracy of classification indices (Series 100)
Grade Cluster
Cut K 1-2 3-5 6-8 9-12
1/2 .925 .974 .977 .968 .951
2/3 .949 .943 .940 .936 .921
3/4 na .928 .917 .912 .924
4/5 na .943 .940 .945 .954
5/6 na .975 .972 .976 .977
27ISBE Presentation 2/21/2007© 2007 WIDA/CAL
3. What do we know about the validity of ACCESS for ELLs® scores?
3. What do we know about the validity of ACCESS for ELLs® scores?
28ISBE Presentation 2/21/2007© 2007 WIDA/CAL
What is validity?What is validity?
Validity refers to an evaluative judgment of the degree to which theoretical rationales and empirical evidence support the adequacy and appropriateness of inferences and actions made on the basis of test scores.
29ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Validity issues for ACCESS for ELLs®Validity issues for ACCESS for ELLs®
Issues related to ACCESS for ELLs® include Do the described proficiency levels exist? How does the test relate to other measures of English
language proficiency? How confident are we in the cut scores that place
students into the various levels, that they really define the levels?
Do we know that ACCESS for ELLs® tests the language needed for academic success and is not a content test?
And so on…
30ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Study 1: Do the levels of the Standards really exist?Study 1: Do the levels of the Standards really exist?
Reading and Listening Selected Response Type Items
SI = Social and Instructional Language LA = language of Language Arts MA = language of Math SC = language of Science SS = language of Social Studies
31ISBE Presentation 2/21/2007© 2007 WIDA/CAL
The Standards guide test developmentThe Standards guide test development
1. ACCESS for ELLS® makes the WIDA Standards operational
2. WIDA Standards providea. Content (What?)b. Performance Levels (How well?)
32ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Large-scale Standards: SC readingLarge-scale Standards: SC reading
33ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Large-Scale standards: SC readingLarge-Scale standards: SC reading
Classify living organisms (such as birds and mammals) by using pictures or
icons
34ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Large-scale Standards: SC readingLarge-scale Standards: SC reading
Interpret data presented in text and
tables in scientific studies
35ISBE Presentation 2/21/2007© 2007 WIDA/CAL
At the given level of English language proficiency, English language learners will process, understand, produce, or use:
5- Bridging
the technical language of the content areas; a variety of sentence lengths of varying linguistic complexity in extended oral or written discourse,
including stories, essays, or reports; oral or written language approaching comparability to that of English proficient peers when presented
with grade level material
4- Expanding
specific and some technical language of the content areas; a variety of sentence lengths of varying linguistic complexity in oral discourse or multiple, related
paragraphs; oral or written language with minimal phonological, syntactic, or semantic errors that do not impede the
overall meaning of the communication when presented with oral or written connected discourse with occasional visual and graphic support
3- Developing
general and some specific language of the content areas; expanded sentences in oral interaction or written paragraphs; oral or written language with phonological, syntactic, or semantic errors that may impede the
communication but retain much of its meaning when presented with oral or written, narrative or expository descriptions with occasional visual and graphic support
2- Beginning
general language related to the content areas; phrases or short sentences; oral or written language with phonological, syntactic, or semantic errors that often impede the meaning of
the communication when presented with one to multiple-step commands, directions, questions, or a series of statements with visual and graphic support
1- Entering
pictorial or graphic representation of the language of the content areas; words, phrases, or chunks of language when presented with one-step commands, directions,
WH-questions, or statements with visual and graphic support
2: general language of the content areas
1: pictorial or graphic representation of the language of the content areas
5: technical language of the content areas
At the given level of English language proficiency, English language learners will process, understand, produce, or use:
36ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Validation issuesValidation issues
Validity is about the adequacy and appropriateness of inferences about students made on the basis of test scores.
The WIDA Standards make claims about what students at five different proficiency levels can do.
Can those claims be substantiated empirically?
37ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Research study questionsResearch study questions
1. Are the ACCESS for ELLs™ items empirically ordered by difficulty as predicted by the WIDA Standards?
2. Does that ordering differ by domain (listening or reading)?
3. Does that ordering differ by standard (SI, LA, MA, SC, SS)?
38ISBE Presentation 2/21/2007© 2007 WIDA/CAL
DataData
Results from ACCESS for ELLs™ field test
Fall 2004
Over 6500 students grades 1 to 12
8 WIDA states
About 3.5% proportional representation
39ISBE Presentation 2/21/2007© 2007 WIDA/CAL
MethodMethod
Items were vertically scaled across grade levels using common item equating
Item difficulty was determined using the Rasch measurement model
Items that did not meet the requirements of the model were eliminated from the analysis
Average item difficulties were calculated by proficiency level
40ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Domain WIDA Proficiency Level
1 2 3 4 5 Total
Standard LA 6 14 19 15 11 65 MA 9 18 27 27 14 95 SC 5 12 16 15 10 58 SI 13 17 22 11 5 68 SS 7 13 19 12 6 57
Listening
Total 40 74 103 80 46 343 LA 8 13 13 15 13 62 MA 5 11 17 13 8 54 SC 5 12 22 20 11 70 SI 9 18 24 11 4 66
Standard
SS 8 12 20 9 7 56
Reading
Total 35 66 96 68 43 308
Number of items used = 651Number of items used = 651
41ISBE Presentation 2/21/2007© 2007 WIDA/CAL
ResultsResults
42ISBE Presentation 2/21/2007© 2007 WIDA/CAL
1 2 3 4 5
WIDA Proficiency Level
300
400
500
600
Ite
m D
iffi
cult
y
Listening and Reading Combined, All Grade Level Clusters
43ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Listening Reading
Domain
1 2 3 4 5
WIDA Proficiency Level
300
400
500
600
Ite
m D
iffi
cult
y
All Grade Level Clusters (by Domain)
44ISBE Presentation 2/21/2007© 2007 WIDA/CAL
LA MA SC SI SS
Standard
1 2 3 4 5
WIDA Proficiency Level
300
400
500
600
Ite
m D
iffi
cu
lty
Listening and Reading Combined, All Grade Level Clusters (by Standard)
45ISBE Presentation 2/21/2007© 2007 WIDA/CAL
ConclusionsConclusions
46ISBE Presentation 2/21/2007© 2007 WIDA/CAL
1. Are the ACCESS for ELLs™ items empirically ordered by difficulty as predicted by the WIDA Standards?
1. Are the ACCESS for ELLs™ items empirically ordered by difficulty as predicted by the WIDA Standards?
Yes. WIDA Standards (MPIs) provided sufficient content and rationale to develop specifications that operationalized the five proficiency levels through listening and reading selected response items.
47ISBE Presentation 2/21/2007© 2007 WIDA/CAL
2. Does that ordering differ by domain (listening or reading)?2. Does that ordering differ by domain (listening or reading)?
No. The general ordering was similar across listening and reading. Some difference between listening level 5 and reading level 5 was observed.
48ISBE Presentation 2/21/2007© 2007 WIDA/CAL
3. Does that ordering differ by standard (SI, LA, MA, SC, SS)?3. Does that ordering differ by standard (SI, LA, MA, SC, SS)?
Yes. SI (social and instructional language) items showed a clear tendency to be easier than items assessing language in the content areas, particularly at higher proficiency levels.
Items assessing language in the content areas were similar except at level 5 where language arts appeared easier than expected.
49ISBE Presentation 2/21/2007© 2007 WIDA/CAL
DiscussionDiscussion
1. While many additional validation issues remain, this preliminary empirical analysis based on the field test data indicate that the WIDA Standards provide a strong basis for distinguishing among proficiency levels of ELLs.
50ISBE Presentation 2/21/2007© 2007 WIDA/CAL
DiscussionDiscussion
2. The operational plan for ongoing WIDA assessment item renewal and development provides opportunity to tighten item specifications based on empirical research while operationalizing the WIDA Standards.
51ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Process of test developmentProcess of test development
2. Standards
3. Specifications
4. Assessment
1. Theory and Research
52ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Study 2: Validation evidence from the bridge studyStudy 2: Validation evidence from the bridge study
What can we learn about ACCESS for ELLs™ from the WIDA Consortium’s bridge study? Study 1: What is the relationship between performances on the
older English language proficiency tests and on ACCESS for ELLS™?
Study 2: What is the relationship between the “cut score” denoting the highest level of proficiency on the older tests and the predicted corresponding score on ACCESS for ELLs™ in terms of ACCESS proficiency levels?
53ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Purpose of the bridge studyPurpose of the bridge study
To help WIDA Consortium member states understand the performances of their ELLs in acquiring English on the older tests (for which they had data) in terms of the new test, especially to: meet compliance with Title III requirements provide continuity of data flow for cohorts of English language
learners identified in 2002-03, the baseline year provide information that may help determine Annual
Measurable Achievement Objectives (AMAOs) for the established cohorts in the transitional year
54ISBE Presentation 2/21/2007© 2007 WIDA/CAL
The older testsThe older tests
IDEA Proficiency Test (IPT)
Language Assessment Scales (LAS)
Language Proficiency Test Series (LPTS)
Maculaitis II (MAC II)
NOTE: The first three tests do NOT have separate scores for listening and speaking!
55ISBE Presentation 2/21/2007© 2007 WIDA/CAL
WIDA levels of English Language ProficiencyWIDA levels of English Language Proficiency
ENTERING
BEGINNING
DEVELOPING
EXPANDING
1
2
3
4
5
BRIDGING
4.5
56ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Participants Participants
4,985 students from IL and RI
GRADE IPT LAS LPTS MAC II TOTAL
K 102 81 246 47 476
1 109 184 216 95 604
2 143 137 246 76 602
3 102 80 290 63 535
4 82 57 146 74 359
5 104 32 216 57 409
6 116 55 142 97 410
7 111 110 58 110 389
8 106 62 48 142 358
9 28 12 150 134 324
10 37 17 120 106 280
11 30 2 92 79 203
12 9 2 31 43 85
Total 1,079 831 1,952 1,123 4,985
57ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Procedures Procedures
2005 operational ACCESS administration (AL, ME, VT)
Participating students in IL and RI administered older test and operational ACCESS within 6-8 week window
Scoring of older test took place within local districts following their standard procedures and submitted to ACCESS scoring vendor
Scoring of ACCESS was with Spring 2005 operational scoring
Data matched by ACCESS scoring vendor
Older test data cleaned at CAL
Analyses at CAL
58ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Analyses: Study 1Analyses: Study 1
Pearson correlations between performances on each form of older test (raw or scale score) and ACCESS for ELLs™ scale scores
Because each form for the older tests was unique, 64 correlational analyses were performed
IPT (14)
LAS (14)
LPTS (16)
MAC II (20)
Summarized by averaging
59ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Results: Study 1 example (IPT Reading)Results: Study 1 example (IPT Reading)
IPT Reading Score with ACCESS Reading Scale Score
IPT Form (Read) ACCESS Read Scale Score
IPT_EL IPT Read Raw Score Pearson Correlation .741**
N 205
IPT_R_1AB IPT Read Raw Score Pearson Correlation .540**
N 250
IPT_R_2AB IPT Read Raw Score Pearson Correlation .618**
N 296
IPT_R_3AB IPT Read Raw Score Pearson Correlation .713**
N 317
60ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Results: Study 1 summary rangeResults: Study 1 summary range
Average Correlations (All Levels of Each Test within Domain)
Test List Speak Read Write
IPT 0.601 0.625 0.653 0.631
LAS 0.503 0.570 0.591 0.525
LPTS 0.603 0.651 0.741 0.675
MAC II 0.433 0.453 0.593 0.509
61ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Results: Study 1 summary by test across domainsResults: Study 1 summary by test across domains
Average Correlations (All Levels of Each Test within Domain)
Test List Speak Read Write
IPT 0.601 0.625 0.653 0.631
LAS 0.503 0.570 0.591 0.525
LPTS 0.603 0.651 0.741 0.675
MAC II 0.433 0.453 0.593 0.509
62ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Results: Study 1 summary by domain across testsResults: Study 1 summary by domain across tests
Average Correlations (All Levels of Each Test within Domain)
Test List Speak Read Write
IPT 0.601 0.625 0.653 0.631
LAS 0.503 0.570 0.591 0.525
LPTS 0.603 0.651 0.741 0.675
MAC II 0.433 0.453 0.593 0.509
63ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Discussion: Study 1Discussion: Study 1
Generally moderate to high correlations between ACCESS for ELLs® and older tests; ACCESS appears to assessing a similar construct (criterion-related validity) but is not interchangeable with the older tests
Correlations across all tests with reading were highest; most familiar to students and test developers?
Correlations across all tests with listening were lowest; but three tests did not have separate scores for listening and speaking!
Correlations across domains between LPTS and ACCESS for ELLs® were highest; LPTS the newest of the ‘older generation’
64ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Analyses: Study 2Analyses: Study 2
From predicted scores tables, found for each grade level the ACCESS for ELLs® proficiency level score corresponding to the “cut score” of the highest proficiency level on the older test
Summarized findings by calculating averages and standard deviations
65ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Predicted scores table examplePredicted scores table example
Predicted ACCESS = 270.1 + 1.738 * LAS
LAS RW 2AB Writing Raw Score to WIDA ACCESS Writing Scale Score
LAS RW 2AB
Raw Score
LAS Proficiency
Level (by grade)
PredictedACCESS
Score
ACCESS Proficiency
Level (by grade)
LAS RW 2AB
Raw Score
LAS Proficiency
Level (by grade)
PredictedACCESS
Score
ACCESS Proficiency
Level (by grade)
Writing 4,5,6 Writing 4,5 6 Writing 4,5,6 Writing 4,5 6
0 1 270 1.9 1.8 28 1 319 3.0 2.4
1 1 272 1.9 1.8 29 1 321 3.1 2.5
2 1 274 1.9 1.8 30 1 322 3.1 2.5
3 1 275 1.9 1.8 31 1 324 3.1 2.5
4 1 277 1.9 1.9 32 1 326 3.1 2.5
5 1 279 1.9 1.9 33 2 327 3.1 2.5
… … … … … … … … … …
27 1 317 2.9 2.4 55 3 366 4.5 3.7
66ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Finding the WIDA proficiency level score exampleFinding the WIDA proficiency level score example
Predicted ACCESS = 270.1 + 1.738 * LAS
LAS RW 2AB Writing Raw Score to WIDA ACCESS Writing Scale Score
LAS RW 2AB
Raw Score
LAS Proficiency
Level (by grade)
PredictedACCESS
Score
ACCESS Proficiency
Level (by grade)
LAS RW 2AB
Raw Score
LAS Proficiency
Level (by grade)
PredictedACCESS
Score
ACCESS Proficiency
Level (by grade)
Writing 4,5,6 Writing 4,5 6 Writing 4,5,6 Writing 4,5 6
… … … … … … … … … …
… … … … … 42 2 343 3.8 3.2
… … … … … 43 2 345 3.9 3.2
… … … … … 44 3 347 3.9 3.3
… … … … … 45 3 348 3.9 3.3
… … … … … 46 3 350 4 3.3
… … … … … … … … … …
67ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Truncated example results: ListeningTruncated example results: Listening
K 1 2 3 … 11 12
Listening
IPT 6.0 6.0 6.0 4.3 … 4.8 4.8
LAS 4.7 6.0 6.0 4.7 … 4.4 4.4
LPTS 3.1 3.4 3.8 3.7 … 3.0 3.0
MAC II 3.3 3.3 5.2 3.3 … 2.9 2.9
68ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Results: Study 2 summary rangeResults: Study 2 summary range
Average Proficiency Level Score
(Standard Deviation)
Test List Speak Read Write
IPT 4.9
(0.80)
4.0
(0.36)
3.9
(0.97)
2.9
(0.64)
LAS 4.8
(0.67)
5.1
(0.81)
3.1
(1.11)
3.1
(0.67)
LPTS 3.5
(0.53)
2.9
(0.79)
5.3
(0.71)
3.9
(0.74)
MAC II 3.7
(0.78)
3.5
(0.74)
3.5
(0.76)
3.0
(0.40)
69ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Interpretation: Highest test and domainInterpretation: Highest test and domain
LPTS Reading
ENTERING
BEGINNING
DEVELOPING
EXPANDING
1
2
3
4
5
BRIDGING
LPTS Reading
70ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Interpretation: Lowest test and domainInterpretation: Lowest test and domain
LPTS Reading
ENTERING
BEGINNING
DEVELOPING
EXPANDING
1
2
3
4
5
BRIDGING
IPT Writing
71ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Results: Study 2 High and low by test across domainsResults: Study 2 High and low by test across domains
Average Proficiency Level Score
(Standard Deviation)
Test List Speak Read Write
IPT 4.9
(0.80)
4.0
(0.36)
3.9
(0.97)
2.9
(0.64)
LAS 4.8
(0.67)
5.1
(0.81)
3.1
(1.11)
3.1
(0.67)
LPTS 3.5
(0.53)
2.9
(0.79)
5.3
(0.71)
3.9
(0.74)
MAC II 3.7
(0.78)
3.5
(0.74)
3.5
(0.76)
3.0
(0.40)
72ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Results: Study 2 High and low by domain across testsResults: Study 2 High and low by domain across tests
Average Proficiency Level Score
(Standard Deviation)
Test List Speak Read Write
IPT 4.9
(0.80)
4.0
(0.36)
3.9
(0.97)
2.9
(0.64)
LAS 4.8
(0.67)
5.1
(0.81)
3.1
(1.11)
3.1
(0.67)
LPTS 3.5
(0.53)
2.9
(0.79)
5.3
(0.71)
3.9
(0.74)
MAC II 3.7
(0.78)
3.5
(0.74)
3.5
(0.76)
3.0
(0.40)
73ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Discussion: Study 2 (1 of 3)Discussion: Study 2 (1 of 3)
Results varied widely from a close relationship to WIDA proficiency span (LPTS Reading) to much lower, though in general, “cut scores” on older tests tended to be much lower than the WIDA 6.0; were ELLs exited too early under the older tests? do ACCESS for ELLs™ standards and performance level
definitions better align with levels of English proficiency needed for academic success?
with a single test across districts within a states, states will have clearer data to better understand the development of English proficiency in ELLs and its relationship to academic achievement
74ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Discussion: Study 2 (2 of 3)Discussion: Study 2 (2 of 3)
Results varied widely across tests and domains; LPTS with the highest “cut scores” in reading and writing had lowest “cut scores” in listening and speaking; but three tests did not have separate scores for listening and
speaking, including LPTS! LPTS had only “fluent”/”non-fluent” listening and speaking
categories?
75ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Discussion: Study 2 (3 of 3)Discussion: Study 2 (3 of 3)
Across tests, writing had lowest “cut scores” for three of four tests; is writing on ACCESS for ELLs™ unduly hard?, or is it more indicative of what is needed for academic success?
76ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Important considerations in interpretationsImportant considerations in interpretations
CONTENT differences between all five tests include: Degree of alignment with English language proficiency and
academic content standards Number and types of items in each subsection or language domain Depth of knowledge of the items Inclusion of the language of math, science, and social studies Ceiling levels of the measures Rubrics used for interpreting speaking and writing
METHODOLOGICAL caveats include: Use of linear regression across all analyses Sometimes small numbers of students in subgroups Distribution of observed scores (Spring testing)
77ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Preliminary conclusionsPreliminary conclusions
Correlational data show strong support for ACCESS for ELLs™ as a measure of English proficiency (criterion-related validity)
Comparison of “cut scores” indicate that the WIDA Standards, as operationalized by ACCESS for ELLs™, describe a longer proficiency continuum than the older tests
Additional studies are needed to explore the relationship between that extended continuum and academic achievement
78ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Validity evidence from the grade level cut score review studyValidity evidence from the grade level cut score review study
75 teachers from 14 WIDA states
Examined test items and (for writing and speaking) examinee performances in light of the WIDA Standard’s model Performance Indicators and the Standard’s performance level descriptors
Through a structured process came up with proposed grade level cut scores (based on empirical proposed scores based on current cluster level cut scores)
As in the original standard setting study, evaluated the confidence they had in the cut scores representing the different performance levels
Results: Confidence increased greatly over first study
79ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Evaluations from grade level cut score reviewEvaluations from grade level cut score review
Averages across all participants
How confident are you in the cut scores? (4 = hi, 1 = lo)
Red = below 3.10 / Black = 3.11 to 3.40 / Green = above 3.40
Read Write List Speak
Orig Rev Orig Rev Orig Rev Orig Rev
1/2 3.08 3.41 3.39 3.46 3.22 3.51 3.24 3.46
2/3 2.83 3.47 3.28 3.43 3.15 3.55 3.01 3.39
3/4 2.98 3.48 3.33 3.36 3.17 3.57 2.89 3.37
4/5 3.05 3.54 3.33 3.35 3.19 3.53 2.84 3.37
5/6 3.01 3.52 3.33 3.41 3.18 3.60 2.97 3.56
80ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Evaluations from grade level cut score reviewEvaluations from grade level cut score review
Averages across all participants
How confident are you in the cut scores? (4 = hi, 1 = lo)
Red = below 3.10 / Black = 3.11 to 3.40 / Green = above 3.40
Read Write List Speak
Orig Rev Orig Rev Orig Rev Orig Rev
1/2 3.08 3.41 3.39 3.46 3.22 3.51 3.24 3.46
2/3 2.83 3.47 3.28 3.43 3.15 3.55 3.01 3.39
3/4 2.98 3.48 3.33 3.36 3.17 3.57 2.89 3.37
4/5 3.05 3.54 3.33 3.35 3.19 3.53 2.84 3.37
5/6 3.01 3.52 3.33 3.41 3.18 3.60 2.97 3.56
81ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Evaluations from grade level cut score reviewEvaluations from grade level cut score review
Averages across all participants
How confident are you in the cut scores? (4 = hi, 1 = lo)
Red = below 3.10 / Black = 3.11 to 3.40 / Green = above 3.40
Read Write List Speak
Orig Rev Orig Rev Orig Rev Orig Rev
1/2 3.08 3.41 3.39 3.46 3.22 3.51 3.24 3.46
2/3 2.83 3.47 3.28 3.43 3.15 3.55 3.01 3.39
3/4 2.98 3.48 3.33 3.36 3.17 3.57 2.89 3.37
4/5 3.05 3.54 3.33 3.35 3.19 3.53 2.84 3.37
5/6 3.01 3.52 3.33 3.41 3.18 3.60 2.97 3.56
82ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Evaluations from grade level cut score reviewEvaluations from grade level cut score review
Averages across all participants
How confident are you in the cut scores? (4 = hi, 1 = lo)
Red = below 3.10 / Black = 3.11 to 3.40 / Green = above 3.40
Read Write List Speak
Orig Rev Orig Rev Orig Rev Orig Rev
1/2 3.08 3.41 3.39 3.46 3.22 3.51 3.24 3.46
2/3 2.83 3.47 3.28 3.43 3.15 3.55 3.01 3.39
3/4 2.98 3.48 3.33 3.36 3.17 3.57 2.89 3.37
4/5 3.05 3.54 3.33 3.35 3.19 3.53 2.84 3.37
5/6 3.01 3.52 3.33 3.41 3.18 3.60 2.97 3.56
83ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Other validity studies underway at CALOther validity studies underway at CAL
Some ongoing internal research at CAL (1) What do we learn from the results of the technical
analyses of Series 100 to improve item and form specifications?
(2) How do we improve the construction of items appropriate (both from content and empirical results) to their targeted proficiency levels?
(3) What evidence do we have that ACCESS for ELLs tests the language of the content areas and not knowledge of the content areas?
84ISBE Presentation 2/21/2007© 2007 WIDA/CAL
#1 Example from Series 100 analyses#1 Example from Series 100 analyses
Figure 8.3.1DTest Information Function: List 3-5 ABC
0
1
2
3
4
5
6
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
Ability Measure
Info
rma
tio
n
85ISBE Presentation 2/21/2007© 2007 WIDA/CAL
#1 Example from Series 100 analyses#1 Example from Series 100 analyses
Figure 8.3.2DTest Information Function: Read 3-5 ABC
0
1
2
3
4
5
6
7
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
Ability Measure
Info
rma
tio
n
86ISBE Presentation 2/21/2007© 2007 WIDA/CAL
#2 Example 3-5 Read Prof Level 2#2 Example 3-5 Read Prof Level 2
5. R_2526_SIp2g35_PlantSale 2 302 11. R_2999_SIp2g35_FamilyNight 2 272 13. R_2675_SIp2g35_Artwork 2 327 29. R_2871_LAp2g35_AngelaPepper 2 321 210. R_2172_LAp2g35_KangarooDream 2 329 28. R_2870_LAp2g35_AngelaPepper 2 335 39. R_2171_LAp2g35_KangarooDream 2 365 5
87ISBE Presentation 2/21/2007© 2007 WIDA/CAL
#2 Example 3-5 Read Prof Level 5#2 Example 3-5 Read Prof Level 5
13. R_2535_LAp5g35_AthleteBio 5 271 13. R_2541_SIp5g35_PlaygroundRules 5 332 312. R_2534_LAp5g35_AthleteBio 5 373 5
88ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Interaction of Performance Level Descriptions and model Performance IndicatorsInteraction of Performance Level Descriptions and model Performance Indicators
Language Proficiency (Performance Level Descriptions)
1 Entering
2 Beginning
3 Developing
4 Expanding
5 Bridging
PIs
L 1
L 2
L 3
L4
L 5
Linguistic Complexity
Vocabulary Usage
Language Control
89ISBE Presentation 2/21/2007© 2007 WIDA/CAL
#3 Confirmatory Factor Analyses (SEM)#3 Confirmatory Factor Analyses (SEM)
RSS
RSI
RLA
RMA
RSC
LSI
LSS
LLA
LMA
LSC
ListScore
ReadScore
L-prof
Engprof
R-prof
SS
SC
MA
LA
SI
90ISBE Presentation 2/21/2007© 2007 WIDA/CAL
Other research (and possibilities)Other research (and possibilities)
1. Native speaker studies (Alabama data)
2. Relationship between performance on ACCESS for ELLs and state content tests (?)
91ISBE Presentation 2/21/2007© 2007 WIDA/CAL
XXXXX X XXXX X X XX X X XNo
Logistic regression with state data?Logistic regression with state data?
Yes XXXXXXXXXX X X
ACCESS Scale Score
low highScore
80%
hi%
lo%
Pro
bab
ility
92ISBE Presentation 2/21/2007© 2007 WIDA/CAL
4. So what does this mean for using scores on ACCESS for ELLs®?
4. So what does this mean for using scores on ACCESS for ELLs®?
Be sure to understand the meaning of scale scores and proficiency level scores
Have confidence using scores knowing that the reliability (consistency) of the scale scores are high;
in particular, for the overall composite score that the accuracy of classification based on the overall
composite is also high initial validity studies strongly support the use of ACCESS for
ELLs® test scores as a valid indicator of levels of proficiency in accordance with the WIDA Standards
the WIDA Consortium supports a rigorous program of on-going test improvement, supported by research
the WIDA Consortium continues to collect evidences in support of the validity of the use of test scores
For more information, please contact the WIDA Hotline:1-866-276-7735 or www.wida.us/helpform
World Class Instructional Design and Assessment, www.wida.us
Center for Applied Linguistics, www.cal.org
Metritech, Inc., www.metritech.com