ACCESS for ELLs® Scores, Reliability and Validity Developed by the Center for Applied Linguistics Prepared by Dorry Kenyon, CAL ISBE Meeting, Chicago,

ACCESS for ELLs®Scores, Reliability and Validity

Developed by the Center for Applied Linguistics

Prepared by Dorry Kenyon, CAL

ISBE Meeting, Chicago, IL

February 21, 2007

2ISBE Presentation 2/21/2007© 2007 WIDA/CAL

Outline of my presentationOutline of my presentation

1. What do scores on ACCESS for ELLs® mean?

2. What do we know about the reliability of ACCESS for ELLs® scores?

3. What do we know about the validity of ACCESS for ELLs® scores?

4. So what does this mean for using scores on ACCESS for ELLs®?


1. What do scores on ACCESS for ELLs® mean?1. What do scores on ACCESS for ELLs® mean?


Two types of scoresTwo types of scores

WIDA ACCESS for ELLs® Scale Scores = psychometrically-derived measure

WIDA ACCESS for ELLs® Proficiency Level Scores = socially-derived interpretation of the scale score in terms of the WIDA Standards’ Proficiency Level Definitions


What is measured?What is measured?

Scale Scores (and interpretive Proficiency Level Scores) are given for measures in the four domains Listening Speaking Reading Writing

Scale Scores are combined into four composite scores (which are also interpreted in Proficiency Level Scores) Oral (listening and speaking) Literacy (reading and writing) Comprehension (listening and reading) Overall Composite (listening, speaking, reading, and

writing)


Weighting of the overall compositeWeighting of the overall composite

Scale Scales of the four domains are weighted differently in the overall composite score Listening (15%) Speaking (15%) Reading (35%) Writing (35%)


ACCESS administration times and composite score weightsACCESS administration times and composite score weights

Listening (15%): 20-25 minutes, machine scored

Reading (35%): 35-40 minutes, machine scored

Writing (35%): Up to 1 hour, rater scored

Speaking (15%): Up to 15 minutes, administrator scored

Test Times(Minutes)Listening,

25

Reading, 40Writing, 60

Speaking, 15

Listening

Reading

Writing

Speaking

Test Weights(Percent)Listening,

15%

Reading, 35%

Writing, 35%

Speaking, 15%

Listening

Reading

Writing

Speaking


Scale Scores vs. Proficiency Level ScoresScale Scores vs. Proficiency Level Scores

The WIDA ACCESS for ELLs® Scale Scores are the psychometrically derived measures of student proficiency Range from 100 to 600 One scale applies to all grades through vertical equating

of tests Vertical scale score takes into account that assessment

tasks taken by students in the grade 9-12 cluster are more challenging than the assessment tasks taken by students in the grade 1-2 cluster

Average scale scores consistently show an increase from grade to grade


2005-2006 Overall Composite Scale Scores2005-2006 Overall Composite Scale Scores

Average Overall Composite Scale Score by Grade

250

300

350

400

1 2 3 4 5 6 7 8 9 101112

Grade

Sca

le S

core

Average OverallCompositeScale Score


Scale Scores vs. Proficiency Level ScoresScale Scores vs. Proficiency Level Scores

Proficiency Level Scores are socially-derived interpretations of the WIDA ACCESS for ELLs® Scale Scores in terms of the six proficiency levels defined in the WIDA Standards Comprised of two numbers, e.g. 2.5

First number indicates the proficiency level into which the student’s scale score places him or her (e.g. 2 = Beginning)

Second number indicates how far, in tenths, the student’s scale places him or her between the lower and the higher cut score of the proficiency level (e.g. 2.5 = 5/10 or ½ of the way between the cut score for level 2 and for level 3)

The same scale score is interpreted differently based on what grade level cluster different students are in

The same proficiency level score corresponds to different scale scores based on the grade level cluster


Example: Scale score of 350Example: Scale score of 350

Grades Domain Cut

1/2 2/3 3/4 4/5 5/6

1-2 Overall 259 285 313 332 354

3-5 Overall 292 325 350 370 394

6-8 Overall 319 347 374 393 410

9-12 Overall 347 373 396 412 429350

350

350

350


Easy ItemsLess Proficient Students

Hard ItemsMore Proficient Students

350 475 600225100

9-12

429

410

6-8

394

3-5

Example: Overall composite proficiency level score 6.0Example: Overall composite proficiency level score 6.0

354

1-2


How are proficiency level scores derived?How are proficiency level scores derived?

While Proficiency Level Scores are socially-derived interpretations, they are not arbitrary Set by panels of content experts Set following best technical practices Set by consensus building procedures (standard setting

studies) Set by carefully documented replicable procedures

For WIDA ACCESS for ELLs®, these were set by panels of experts in April of 2004, for each grade level cluster (see WIDA Technical Report #1 for complete details)


Originally WIDA had grade level cluster cutsOriginally WIDA had grade level cluster cutsOverall Composite (Current Cuts)

220

230

240

250

260

270

280

290

300

310

320

330

340

350

360

370

380

390

400

410

420

430

440

450

0 1 2 3 4 5 6 7 8 9 10 11 12

Grade

Sca

le S

core current1/2

current2/3

current3/4

current4/5

current5/6

1

2

3

4

5

6


Grade level cuts are being introduced this yearGrade level cuts are being introduced this year

Overall Composite (Proposed Smoothed Cuts)

220

230

240

250

260

270

280

290

300

310

320

330

340

350

360

370

380

390

400

410

420

430

440

450

0 1 2 3 4 5 6 7 8 9 10 11 12

Grade

Sca

le S

core proposed1/2

proposed2/3

proposed3/4

proposed4/5

proposed5/6

1

2

34

5

6


Cluster vs. grade level cutsCluster vs. grade level cuts

Overall Composite (Proposed vs. Current Cuts)

220

230

240

250

260

270

280

290

300

310

320

330

340

350

360

370

380

390

400

410

420

430

440

450

0 1 2 3 4 5 6 7 8 9 10 11 12

Grade

Sca

le S

core

current1/2

current2/3

current3/4

current4/5

current5/6

proposed1/2

proposed2/3

proposed3/4

proposed4/5

proposed5/6


2005-2006 Overall Composite Scale Scores2005-2006 Overall Composite Scale Scores

Average Overall Composite Scale Score by Grade

250

300

350

400

1 2 3 4 5 6 7 8 9 101112

Grade

Sca

le S

core

Average OverallCompositeScale Score


Effect of grade level cut scoresEffect of grade level cut scores

Prof Level Score of Average Overall Composite Scale Score by Grade

1.0

2.0

3.0

4.0

5.0

6.0

1 2 3 4 5 6 7 8 9 101112

Grade

Pro

fien

cy L

evel

S

core

Current ClusterCuts

ProposedGrade LevelCutsP

rofi

cien

cy L

evel

S

core





What is reliability? What is reliability?

Psychometrically speaking, reliability refers to the consistency of test scores.

What evidence is there that this test score result is not just a chance occurrence, but would have been obtained had the student been tested on multiple occurrences or scored under multiple occasions?


Multiple forms of ACCESS for ELLs®Multiple forms of ACCESS for ELLs®

In the Annual Technical Report, the reliability of each of the 44 separate test forms for ACCESS for ELLs® is reported.

Cluster List Read Write Speak Total

K 1 1 1 1 4

1-2 3 3 3 1 10

3-5 3 3 3 1 10

6-8 3 3 3 1 10

9-12 3 3 3 1 10

Total 13 13 13 5 44


Types of reliability reportedTypes of reliability reported

For all test forms, internal consistency (coefficient alpha) is reported.

For writing, agreement between operational raters is also reported (20%)

For speaking, agreement between administrators from field test data is also given currently, but a larger study is underway

Reliabilities for domain scores based on the individual forms for Series 100 (2004-2005) are within expected and acceptable ranges


Reliability of the overall compositeReliability of the overall composite

Results indicate that the reliability of the overall composite score across tiers is similar and very high across all grade level clusters (Series 100).

K .930

1-2 .949

3-5 .941

6-8 .933

9-12 .936


The most important reliability indexThe most important reliability index

For tests like ACCESS for ELLs®, by which decisions are based on a student’s classification into proficiency levels, the accuracy of classification is perhaps the most important reliability index.

This index gives an estimate of how reliably a student was placed to be at least at or above a certain category (versus below that category).


Accuracy of classification indices (Series 100)Accuracy of classification indices (Series 100)

Grade Cluster

Cut K 1-2 3-5 6-8 9-12

1/2 .925 .974 .977 .968 .951

2/3 .949 .943 .940 .936 .921

3/4 na .928 .917 .912 .924

4/5 na .943 .940 .945 .954

5/6 na .975 .972 .976 .977





What is validity?What is validity?

Validity refers to an evaluative judgment of the degree to which theoretical rationales and empirical evidence support the adequacy and appropriateness of inferences and actions made on the basis of test scores.


Validity issues for ACCESS for ELLs®Validity issues for ACCESS for ELLs®

Issues related to ACCESS for ELLs® include Do the described proficiency levels exist? How does the test relate to other measures of English

language proficiency? How confident are we in the cut scores that place

students into the various levels, that they really define the levels?

Do we know that ACCESS for ELLs® tests the language needed for academic success and is not a content test?

And so on…


Study 1: Do the levels of the Standards really exist?Study 1: Do the levels of the Standards really exist?

Reading and Listening Selected Response Type Items

SI = Social and Instructional Language LA = language of Language Arts MA = language of Math SC = language of Science SS = language of Social Studies


The Standards guide test developmentThe Standards guide test development

1. ACCESS for ELLS® makes the WIDA Standards operational

2. WIDA Standards providea. Content (What?)b. Performance Levels (How well?)


Large-scale Standards: SC readingLarge-scale Standards: SC reading


Large-Scale standards: SC readingLarge-Scale standards: SC reading

Classify living organisms (such as birds and mammals) by using pictures or

icons


Large-scale Standards: SC readingLarge-scale Standards: SC reading

Interpret data presented in text and

tables in scientific studies


At the given level of English language proficiency, English language learners will process, understand, produce, or use:

5- Bridging

the technical language of the content areas; a variety of sentence lengths of varying linguistic complexity in extended oral or written discourse,

including stories, essays, or reports; oral or written language approaching comparability to that of English proficient peers when presented

with grade level material

4- Expanding

specific and some technical language of the content areas; a variety of sentence lengths of varying linguistic complexity in oral discourse or multiple, related

paragraphs; oral or written language with minimal phonological, syntactic, or semantic errors that do not impede the

overall meaning of the communication when presented with oral or written connected discourse with occasional visual and graphic support

3- Developing

general and some specific language of the content areas; expanded sentences in oral interaction or written paragraphs; oral or written language with phonological, syntactic, or semantic errors that may impede the

communication but retain much of its meaning when presented with oral or written, narrative or expository descriptions with occasional visual and graphic support

2- Beginning

general language related to the content areas; phrases or short sentences; oral or written language with phonological, syntactic, or semantic errors that often impede the meaning of

the communication when presented with one to multiple-step commands, directions, questions, or a series of statements with visual and graphic support

1- Entering

pictorial or graphic representation of the language of the content areas; words, phrases, or chunks of language when presented with one-step commands, directions,

WH-questions, or statements with visual and graphic support

2: general language of the content areas

1: pictorial or graphic representation of the language of the content areas

5: technical language of the content areas

At the given level of English language proficiency, English language learners will process, understand, produce, or use:


Validation issuesValidation issues

Validity is about the adequacy and appropriateness of inferences about students made on the basis of test scores.

The WIDA Standards make claims about what students at five different proficiency levels can do.

Can those claims be substantiated empirically?


Research study questionsResearch study questions

1. Are the ACCESS for ELLs™ items empirically ordered by difficulty as predicted by the WIDA Standards?

2. Does that ordering differ by domain (listening or reading)?

3. Does that ordering differ by standard (SI, LA, MA, SC, SS)?


DataData

Results from ACCESS for ELLs™ field test

Fall 2004

Over 6500 students grades 1 to 12

8 WIDA states

About 3.5% proportional representation


MethodMethod

Items were vertically scaled across grade levels using common item equating

Item difficulty was determined using the Rasch measurement model

Items that did not meet the requirements of the model were eliminated from the analysis

Average item difficulties were calculated by proficiency level


Domain WIDA Proficiency Level

1 2 3 4 5 Total

Standard LA 6 14 19 15 11 65 MA 9 18 27 27 14 95 SC 5 12 16 15 10 58 SI 13 17 22 11 5 68 SS 7 13 19 12 6 57

Listening

Total 40 74 103 80 46 343 LA 8 13 13 15 13 62 MA 5 11 17 13 8 54 SC 5 12 22 20 11 70 SI 9 18 24 11 4 66

Standard

SS 8 12 20 9 7 56

Reading

Total 35 66 96 68 43 308

Number of items used = 651Number of items used = 651


ResultsResults


1 2 3 4 5

WIDA Proficiency Level

300

400

500

600

Ite

m D

iffi

cult

y

Listening and Reading Combined, All Grade Level Clusters


Listening Reading

Domain

1 2 3 4 5


300

400

500

600

Ite

m D

iffi

cult

y

All Grade Level Clusters (by Domain)


LA MA SC SI SS

Standard

1 2 3 4 5


300

400

500

600

Ite

m D

iffi

cu

lty

Listening and Reading Combined, All Grade Level Clusters (by Standard)


ConclusionsConclusions




Yes. WIDA Standards (MPIs) provided sufficient content and rationale to develop specifications that operationalized the five proficiency levels through listening and reading selected response items.


2. Does that ordering differ by domain (listening or reading)?2. Does that ordering differ by domain (listening or reading)?

No. The general ordering was similar across listening and reading. Some difference between listening level 5 and reading level 5 was observed.


3. Does that ordering differ by standard (SI, LA, MA, SC, SS)?3. Does that ordering differ by standard (SI, LA, MA, SC, SS)?

Yes. SI (social and instructional language) items showed a clear tendency to be easier than items assessing language in the content areas, particularly at higher proficiency levels.

Items assessing language in the content areas were similar except at level 5 where language arts appeared easier than expected.


DiscussionDiscussion

1. While many additional validation issues remain, this preliminary empirical analysis based on the field test data indicate that the WIDA Standards provide a strong basis for distinguishing among proficiency levels of ELLs.


DiscussionDiscussion

2. The operational plan for ongoing WIDA assessment item renewal and development provides opportunity to tighten item specifications based on empirical research while operationalizing the WIDA Standards.


Process of test developmentProcess of test development

2. Standards

3. Specifications

4. Assessment

1. Theory and Research


Study 2: Validation evidence from the bridge studyStudy 2: Validation evidence from the bridge study

What can we learn about ACCESS for ELLs™ from the WIDA Consortium’s bridge study? Study 1: What is the relationship between performances on the

older English language proficiency tests and on ACCESS for ELLS™?

Study 2: What is the relationship between the “cut score” denoting the highest level of proficiency on the older tests and the predicted corresponding score on ACCESS for ELLs™ in terms of ACCESS proficiency levels?


Purpose of the bridge studyPurpose of the bridge study

To help WIDA Consortium member states understand the performances of their ELLs in acquiring English on the older tests (for which they had data) in terms of the new test, especially to: meet compliance with Title III requirements provide continuity of data flow for cohorts of English language

learners identified in 2002-03, the baseline year provide information that may help determine Annual

Measurable Achievement Objectives (AMAOs) for the established cohorts in the transitional year


The older testsThe older tests

IDEA Proficiency Test (IPT)

Language Assessment Scales (LAS)

Language Proficiency Test Series (LPTS)

Maculaitis II (MAC II)

NOTE: The first three tests do NOT have separate scores for listening and speaking!


WIDA levels of English Language ProficiencyWIDA levels of English Language Proficiency

ENTERING

BEGINNING

DEVELOPING

EXPANDING

1

2

3

4

5

BRIDGING

4.5


Participants Participants

4,985 students from IL and RI

GRADE IPT LAS LPTS MAC II TOTAL

K 102 81 246 47 476

1 109 184 216 95 604

2 143 137 246 76 602

3 102 80 290 63 535

4 82 57 146 74 359

5 104 32 216 57 409

6 116 55 142 97 410

7 111 110 58 110 389

8 106 62 48 142 358

9 28 12 150 134 324

10 37 17 120 106 280

11 30 2 92 79 203

12 9 2 31 43 85

Total 1,079 831 1,952 1,123 4,985


Procedures Procedures

2005 operational ACCESS administration (AL, ME, VT)

Participating students in IL and RI administered older test and operational ACCESS within 6-8 week window

Scoring of older test took place within local districts following their standard procedures and submitted to ACCESS scoring vendor

Scoring of ACCESS was with Spring 2005 operational scoring

Data matched by ACCESS scoring vendor

Older test data cleaned at CAL

Analyses at CAL


Analyses: Study 1Analyses: Study 1

Pearson correlations between performances on each form of older test (raw or scale score) and ACCESS for ELLs™ scale scores

Because each form for the older tests was unique, 64 correlational analyses were performed

IPT (14)

LAS (14)

LPTS (16)

MAC II (20)

Summarized by averaging


Results: Study 1 example (IPT Reading)Results: Study 1 example (IPT Reading)

IPT Reading Score with ACCESS Reading Scale Score

IPT Form (Read) ACCESS Read Scale Score

IPT_EL IPT Read Raw Score Pearson Correlation .741**

N 205

IPT_R_1AB IPT Read Raw Score Pearson Correlation .540**

N 250


N 296


N 317


Results: Study 1 summary rangeResults: Study 1 summary range

Average Correlations (All Levels of Each Test within Domain)

Test List Speak Read Write

IPT 0.601 0.625 0.653 0.631

LAS 0.503 0.570 0.591 0.525

LPTS 0.603 0.651 0.741 0.675

MAC II 0.433 0.453 0.593 0.509


Results: Study 1 summary by test across domainsResults: Study 1 summary by test across domains



IPT 0.601 0.625 0.653 0.631

LAS 0.503 0.570 0.591 0.525

LPTS 0.603 0.651 0.741 0.675

MAC II 0.433 0.453 0.593 0.509


Results: Study 1 summary by domain across testsResults: Study 1 summary by domain across tests



IPT 0.601 0.625 0.653 0.631

LAS 0.503 0.570 0.591 0.525

LPTS 0.603 0.651 0.741 0.675

MAC II 0.433 0.453 0.593 0.509


Discussion: Study 1Discussion: Study 1

Generally moderate to high correlations between ACCESS for ELLs® and older tests; ACCESS appears to assessing a similar construct (criterion-related validity) but is not interchangeable with the older tests

Correlations across all tests with reading were highest; most familiar to students and test developers?

Correlations across all tests with listening were lowest; but three tests did not have separate scores for listening and speaking!

Correlations across domains between LPTS and ACCESS for ELLs® were highest; LPTS the newest of the ‘older generation’


Analyses: Study 2Analyses: Study 2

From predicted scores tables, found for each grade level the ACCESS for ELLs® proficiency level score corresponding to the “cut score” of the highest proficiency level on the older test

Summarized findings by calculating averages and standard deviations


Predicted scores table examplePredicted scores table example

Predicted ACCESS = 270.1 + 1.738 * LAS

LAS RW 2AB Writing Raw Score to WIDA ACCESS Writing Scale Score

LAS RW 2AB

Raw Score

LAS Proficiency

Level (by grade)

PredictedACCESS

Score

ACCESS Proficiency

Level (by grade)

LAS RW 2AB

Raw Score

LAS Proficiency

Level (by grade)

PredictedACCESS

Score

ACCESS Proficiency

Level (by grade)

Writing 4,5,6 Writing 4,5 6 Writing 4,5,6 Writing 4,5 6

0 1 270 1.9 1.8 28 1 319 3.0 2.4

1 1 272 1.9 1.8 29 1 321 3.1 2.5

2 1 274 1.9 1.8 30 1 322 3.1 2.5

3 1 275 1.9 1.8 31 1 324 3.1 2.5

4 1 277 1.9 1.9 32 1 326 3.1 2.5

5 1 279 1.9 1.9 33 2 327 3.1 2.5

… … … … … … … … … …

27 1 317 2.9 2.4 55 3 366 4.5 3.7


Finding the WIDA proficiency level score exampleFinding the WIDA proficiency level score example

Predicted ACCESS = 270.1 + 1.738 * LAS

LAS RW 2AB Writing Raw Score to WIDA ACCESS Writing Scale Score

LAS RW 2AB

Raw Score

LAS Proficiency

Level (by grade)

PredictedACCESS

Score

ACCESS Proficiency

Level (by grade)

LAS RW 2AB

Raw Score

LAS Proficiency

Level (by grade)

PredictedACCESS

Score

ACCESS Proficiency

Level (by grade)

Writing 4,5,6 Writing 4,5 6 Writing 4,5,6 Writing 4,5 6

… … … … … … … … … …

… … … … … 42 2 343 3.8 3.2

… … … … … 43 2 345 3.9 3.2

… … … … … 44 3 347 3.9 3.3

… … … … … 45 3 348 3.9 3.3

… … … … … 46 3 350 4 3.3

… … … … … … … … … …


Truncated example results: ListeningTruncated example results: Listening

K 1 2 3 … 11 12

Listening

IPT 6.0 6.0 6.0 4.3 … 4.8 4.8

LAS 4.7 6.0 6.0 4.7 … 4.4 4.4

LPTS 3.1 3.4 3.8 3.7 … 3.0 3.0

MAC II 3.3 3.3 5.2 3.3 … 2.9 2.9


Results: Study 2 summary rangeResults: Study 2 summary range

Average Proficiency Level Score

(Standard Deviation)


IPT 4.9

(0.80)

4.0

(0.36)

3.9

(0.97)

2.9

(0.64)

LAS 4.8

(0.67)

5.1

(0.81)

3.1

(1.11)

3.1

(0.67)

LPTS 3.5

(0.53)

2.9

(0.79)

5.3

(0.71)

3.9

(0.74)

MAC II 3.7

(0.78)

3.5

(0.74)

3.5

(0.76)

3.0

(0.40)


Interpretation: Highest test and domainInterpretation: Highest test and domain

LPTS Reading

ENTERING

BEGINNING

DEVELOPING

EXPANDING

1

2

3

4

5

BRIDGING

LPTS Reading


Interpretation: Lowest test and domainInterpretation: Lowest test and domain

LPTS Reading

ENTERING

BEGINNING

DEVELOPING

EXPANDING

1

2

3

4

5

BRIDGING

IPT Writing


Results: Study 2 High and low by test across domainsResults: Study 2 High and low by test across domains




IPT 4.9

(0.80)

4.0

(0.36)

3.9

(0.97)

2.9

(0.64)

LAS 4.8

(0.67)

5.1

(0.81)

3.1

(1.11)

3.1

(0.67)

LPTS 3.5

(0.53)

2.9

(0.79)

5.3

(0.71)

3.9

(0.74)

MAC II 3.7

(0.78)

3.5

(0.74)

3.5

(0.76)

3.0

(0.40)


Results: Study 2 High and low by domain across testsResults: Study 2 High and low by domain across tests




IPT 4.9

(0.80)

4.0

(0.36)

3.9

(0.97)

2.9

(0.64)

LAS 4.8

(0.67)

5.1

(0.81)

3.1

(1.11)

3.1

(0.67)

LPTS 3.5

(0.53)

2.9

(0.79)

5.3

(0.71)

3.9

(0.74)

MAC II 3.7

(0.78)

3.5

(0.74)

3.5

(0.76)

3.0

(0.40)


Discussion: Study 2 (1 of 3)Discussion: Study 2 (1 of 3)

Results varied widely from a close relationship to WIDA proficiency span (LPTS Reading) to much lower, though in general, “cut scores” on older tests tended to be much lower than the WIDA 6.0; were ELLs exited too early under the older tests? do ACCESS for ELLs™ standards and performance level

definitions better align with levels of English proficiency needed for academic success?

with a single test across districts within a states, states will have clearer data to better understand the development of English proficiency in ELLs and its relationship to academic achievement



Results varied widely across tests and domains; LPTS with the highest “cut scores” in reading and writing had lowest “cut scores” in listening and speaking; but three tests did not have separate scores for listening and

speaking, including LPTS! LPTS had only “fluent”/”non-fluent” listening and speaking

categories?



Across tests, writing had lowest “cut scores” for three of four tests; is writing on ACCESS for ELLs™ unduly hard?, or is it more indicative of what is needed for academic success?


Important considerations in interpretationsImportant considerations in interpretations

CONTENT differences between all five tests include: Degree of alignment with English language proficiency and

academic content standards Number and types of items in each subsection or language domain Depth of knowledge of the items Inclusion of the language of math, science, and social studies Ceiling levels of the measures Rubrics used for interpreting speaking and writing

METHODOLOGICAL caveats include: Use of linear regression across all analyses Sometimes small numbers of students in subgroups Distribution of observed scores (Spring testing)


Preliminary conclusionsPreliminary conclusions

Correlational data show strong support for ACCESS for ELLs™ as a measure of English proficiency (criterion-related validity)

Comparison of “cut scores” indicate that the WIDA Standards, as operationalized by ACCESS for ELLs™, describe a longer proficiency continuum than the older tests

Additional studies are needed to explore the relationship between that extended continuum and academic achievement


Validity evidence from the grade level cut score review studyValidity evidence from the grade level cut score review study

75 teachers from 14 WIDA states

Examined test items and (for writing and speaking) examinee performances in light of the WIDA Standard’s model Performance Indicators and the Standard’s performance level descriptors

Through a structured process came up with proposed grade level cut scores (based on empirical proposed scores based on current cluster level cut scores)

As in the original standard setting study, evaluated the confidence they had in the cut scores representing the different performance levels

Results: Confidence increased greatly over first study


Evaluations from grade level cut score reviewEvaluations from grade level cut score review

Averages across all participants

How confident are you in the cut scores? (4 = hi, 1 = lo)

Red = below 3.10 / Black = 3.11 to 3.40 / Green = above 3.40

Read Write List Speak

Orig Rev Orig Rev Orig Rev Orig Rev

1/2 3.08 3.41 3.39 3.46 3.22 3.51 3.24 3.46

2/3 2.83 3.47 3.28 3.43 3.15 3.55 3.01 3.39

3/4 2.98 3.48 3.33 3.36 3.17 3.57 2.89 3.37

4/5 3.05 3.54 3.33 3.35 3.19 3.53 2.84 3.37

5/6 3.01 3.52 3.33 3.41 3.18 3.60 2.97 3.56








1/2 3.08 3.41 3.39 3.46 3.22 3.51 3.24 3.46

2/3 2.83 3.47 3.28 3.43 3.15 3.55 3.01 3.39

3/4 2.98 3.48 3.33 3.36 3.17 3.57 2.89 3.37

4/5 3.05 3.54 3.33 3.35 3.19 3.53 2.84 3.37

5/6 3.01 3.52 3.33 3.41 3.18 3.60 2.97 3.56








1/2 3.08 3.41 3.39 3.46 3.22 3.51 3.24 3.46

2/3 2.83 3.47 3.28 3.43 3.15 3.55 3.01 3.39

3/4 2.98 3.48 3.33 3.36 3.17 3.57 2.89 3.37

4/5 3.05 3.54 3.33 3.35 3.19 3.53 2.84 3.37

5/6 3.01 3.52 3.33 3.41 3.18 3.60 2.97 3.56








1/2 3.08 3.41 3.39 3.46 3.22 3.51 3.24 3.46

2/3 2.83 3.47 3.28 3.43 3.15 3.55 3.01 3.39

3/4 2.98 3.48 3.33 3.36 3.17 3.57 2.89 3.37

4/5 3.05 3.54 3.33 3.35 3.19 3.53 2.84 3.37

5/6 3.01 3.52 3.33 3.41 3.18 3.60 2.97 3.56


Other validity studies underway at CALOther validity studies underway at CAL

Some ongoing internal research at CAL (1) What do we learn from the results of the technical

analyses of Series 100 to improve item and form specifications?

(2) How do we improve the construction of items appropriate (both from content and empirical results) to their targeted proficiency levels?

(3) What evidence do we have that ACCESS for ELLs tests the language of the content areas and not knowledge of the content areas?


#1 Example from Series 100 analyses#1 Example from Series 100 analyses

Figure 8.3.1DTest Information Function: List 3-5 ABC

0

1

2

3

4

5

6

-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

Ability Measure

Info

rma

tio

n


#1 Example from Series 100 analyses#1 Example from Series 100 analyses

Figure 8.3.2DTest Information Function: Read 3-5 ABC

0

1

2

3

4

5

6

7

-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

Ability Measure

Info

rma

tio

n


#2 Example 3-5 Read Prof Level 2#2 Example 3-5 Read Prof Level 2

5. R_2526_SIp2g35_PlantSale 2 302 11. R_2999_SIp2g35_FamilyNight 2 272 13. R_2675_SIp2g35_Artwork 2 327 29. R_2871_LAp2g35_AngelaPepper 2 321 210. R_2172_LAp2g35_KangarooDream 2 329 28. R_2870_LAp2g35_AngelaPepper 2 335 39. R_2171_LAp2g35_KangarooDream 2 365 5


#2 Example 3-5 Read Prof Level 5#2 Example 3-5 Read Prof Level 5

13. R_2535_LAp5g35_AthleteBio 5 271 13. R_2541_SIp5g35_PlaygroundRules 5 332 312. R_2534_LAp5g35_AthleteBio 5 373 5


Interaction of Performance Level Descriptions and model Performance IndicatorsInteraction of Performance Level Descriptions and model Performance Indicators

Language Proficiency (Performance Level Descriptions)

1 Entering

2 Beginning

3 Developing

4 Expanding

5 Bridging

PIs

L 1

L 2

L 3

L4

L 5

Linguistic Complexity

Vocabulary Usage

Language Control


#3 Confirmatory Factor Analyses (SEM)#3 Confirmatory Factor Analyses (SEM)

RSS

RSI

RLA

RMA

RSC

LSI

LSS

LLA

LMA

LSC

ListScore

ReadScore

L-prof

Engprof

R-prof

SS

SC

MA

LA

SI


Other research (and possibilities)Other research (and possibilities)

1. Native speaker studies (Alabama data)

2. Relationship between performance on ACCESS for ELLs and state content tests (?)


XXXXX X XXXX X X XX X X XNo

Logistic regression with state data?Logistic regression with state data?

Yes XXXXXXXXXX X X

ACCESS Scale Score

low highScore

80%

hi%

lo%

Pro

bab

ility




Be sure to understand the meaning of scale scores and proficiency level scores

Have confidence using scores knowing that the reliability (consistency) of the scale scores are high;

in particular, for the overall composite score that the accuracy of classification based on the overall

composite is also high initial validity studies strongly support the use of ACCESS for

ELLs® test scores as a valid indicator of levels of proficiency in accordance with the WIDA Standards

the WIDA Consortium supports a rigorous program of on-going test improvement, supported by research

the WIDA Consortium continues to collect evidences in support of the validity of the use of test scores

For more information, please contact the WIDA Hotline:1-866-276-7735 or www.wida.us/helpform

World Class Instructional Design and Assessment, www.wida.us

Center for Applied Linguistics, www.cal.org

Metritech, Inc., www.metritech.com

Documents

ACCESS for ELLs® Scores, Reliability and Validity Developed by the Center for Applied Linguistics Prepared by Dorry Kenyon, CAL ISBE Meeting, Chicago,