Engineering PLDs and Test Content to Engineer Cut … · 2 Define and illustrate test content‐PLD alignment Principled design and validity argumentation call for this alignment

OR Aligning Test Development with Intended Score Interpretations

Steve Ferrara

A presentation in Engineered Cut Scores: Aligning Standard Setting Methodology with Contemporary Assessment Design Principles, a session in the National Conference on Student AssessmentJune 22, 2016

Engineering PLDs and Test Content to Engineer Cut Scores

2

Define and illustrate test content‐PLD alignment

Principled design and validity argumentation call for this alignment

Engineering PLDs

Engineering test forms

Engineering items

Concept not new; engineering PLDs, test forms, and items is WIP

Overview

Engineered Cut Scores NCSA Session

3

Need a broader conception of alignment to engineer cut scores

Proposed: Alignment is the degree to which an item’s response demands are consistent with the knowledge and skill requirements described in the corresponding PLD

Aside: Need a broader type of alignment to provide evidence to support intended score interpretations and uses, as well

PremiseResponse demands: Content, cognitive, and linguistic features of items that are related to item difficulty and quality (i.e., discrimination)

Response demands: Content, cognitive, and linguistic features of items that are related to item difficulty and quality (i.e., discrimination)


4

What is test content-PLD alignment?

PARCC PLDs, grade 6 Reading; http://www.parcconline.org/assessments/test‐design/ela‐literacy/ela‐performance‐level‐descriptors


AlignedAlignedHigh

xxx Item Response Demandsxxx Content, Cognitive, Linguisticxxx Consistent with Level 4 KSAsxxx Item Response Demandsxxx Content, Cognitive, Linguisticxxx Consistent with Level 3 KSAsxxx Item Response Demandsxxx Content, Cognitive, Linguisticxxx Consistent with Level 2 KSAs

Low

Level 4: Approaches Expectations for assessed contentVery complex text: General accuracy and understandingModerately complex text: General accuracy and understandingReadily accessible text: Mostly accurate analyses, showing understanding

Level 3: Approaches Expectations for assessed contentVery complex text: Minimal accuracy and understandingModerately complex text: General accuracy, basic understandingReadily accessible text: Mostly accurate analyses, showing understanding

Level 2: Partially Meets Expectations for assessed standardsVery complex text: Inaccurate analysis, limited understandingModerately complex text: Minimal accuracy and understandingReadily accessible text: Partial accuracy and understanding

ExpectationsLevel 2

Partially MeetsExpectations

Level 4Meets

ExpectationsLevel 3

Approaches

ArticulatedArticulated

5

Test content-PLD alignment and articulation not automatic

MisalignmentMisalignment




Ferrara, Svetina, Skucha, & Davidson, 2011

Alignment within and

across grades

Alignment within and

across grades


Engineered Cut Scores NCSA Session 6

Linking reading comprehension items to the Common European Frame of Reference (CEFR)

Figueras, N., Kaftandjieva, F., & Takala, S. (2013)

PRINCIPLED APPROACHES

7Engineered Cut Scores NCSA Session

Several names, several conceptualizations Common elements, varying details (Ferrara, Lai, Reilly, & Nichols, 2016)◦ Evidence Centered Design (ECD)

◦ Assessment Engineering (AE)

◦ Cognitive Design Systems (CDS)

◦ BEAR Assessment System

◦ Principled Design for Efficacy (PDE)

Principled approaches to assessment design, development, and implementation


Principled approaches: Design (etc.) for alignment

Table 1 Foundation and Organizing Elements of Principled Approaches to Assessment Design, Development, and Implementation and their Relationship to the Assessment Triangle

Framework Elements Assessment Triangle Alignment

Organizing Element

Ongoing accumulation of evidence to support validity arguments Overall evidentiary reasoning goal

Foundational Elements

Clearly defined assessment targets Cognition

Statement of intended score interpretations and uses Cognition

Model of cognition, learning, or performance Cognition

Aligned measurement models and reporting scales Interpretation

Manipulation of assessment activities to align with assessment targets and intended score interpretations and uses

Observation

From Ferrara, Lai, Reilly, & Nichols (2017)

Principled approaches are Assessment Engineering, BEAR Assessment System, Cognitive Design System, Evidence Centered Design, and Principled Design for Efficacy

Intended score interpretationsIntended score interpretations

AlignmentAlignment


ENGINEERING PLDS


Logic of inferences

Ferrara, Lai, Reilly, & Nichols, 2016, Fig. 1

PLDs

Everything else, including items, aligned

to PLDs


Framework for developing PLDs

From Egan, Schneider, & Ferrara, 2012, Table 2


13

Develop PLDs to guide test development, articulate standards across grades, link to learning trajectories (Bejar, Braun, & Tannenbaum, 2006, 2007)

Policy definitions

◦ Generic, by policy makers (Loomis & Bourque, 2001; Perie, 2008)

◦ Number of levels; labels and meaning (Beck, 2003; Burt & Stapleton, 2010; Cizek & Bunch, 2007; Egan, Schneider, & Ferrara, 2012; Perie, 2008; Zieky, Perie, & Livingston, 2008)

Range and other PLDs

◦ Explicit about content knowledge (Egan et al., 2012; Mills & Jaeger, 1998; Perie, 2008; US Department of Education 2004)

◦ Explicit about cognitive processes (Egan et al., 2012; Perie, 2008)

◦ Nouns and verbs, defining phrases (Egan et al., 2012)

Would like to find guidance on working from policy PLDs to content standards to range PLDs

Guidance on engineering PLDs



PURSUING PLD-ITEM ALIGNMENT

Code items for response demands Determine which items are aligned with the KSA requirements in the corresponding PLD

Build aligned test forms◦ Select only aligned items

Item development◦ Develop item specifications and item writer training to improve alignment◦ Re‐field test items that are not aligned to PLDs

How can we pursue alignment?


WIP: Next study, I hope

WIP: Next study, I hope

ENGINEERING TEST FORMS FOR ALIGNMENT: ILLUSTRATION


17

Question Type◦ The cognitive task an item poses (e.g., explain, analyze)

Depth of Knowledge◦ Recall, Skill/concept, Strategic thinking

Relational Complexity◦ No. of facts, concepts, and skills to be processed

Linguistic Complexity◦ No. of prepositional phrases, as a proxy

Command of Textual Evidence◦ Single, multiple pieces of text

Response Mode◦ Select response, multiple responses, construct responses

Selected item response demand codes


Excerpt from Of Fat, Feathers…; item 3874 Finger

here

Finger here

Finger here


19

Item 3874

Item Response DemandsQuestion Type USEDOK level 2. Skill/conceptRelational Complexity

5

Number of Prepositions

5

Command of Textual Evidence

Low

Response Mode Low

PLD Alignment TargetModerate Complexity text: Minimal accuracy and understandingDo correct responses to this item support this intended interpretation(or claim)?


Excerpt from Turn, Turn My Wheel and item 3182

Prior knowledge not going to help

21

Item 3182

Item Response DemandsQuestion Type USE and INFDOK level 3. Strat. Thinking

Relational Complexity

6 or more


7


Low‐Moderate

Response Mode Low

PLD Alignment TargetReadily Accessible text: Partial accuracy and understandingDo correct responses to this item support this intended interpretation(or claim)?


Item Response DemandsQuestion Type USEDOK level 2. Skill/conceptRelational Complexity

5


5


Low

Response Mode Low

22

Excerpt from A Single Shard: item 3175

Prior knowledge of “arid” might help, but “arid bones” is figurative language.


“Arid” seems pretty clear; “cold,”, “long,” and “rotten” seem unlikely. So what

makes this relatively difficult?

23

Item 3175

Item Response DemandsQuestion Type USE or INFDOK level 2. Skill/concept


5 or more


4


Low

Response Mode Low

PLD Alignment TargetVery complex text: Inaccurate analysis, limited understandingDo correct responses to this item support this intended interpretation(or claim)?


Item Response DemandsQuestion Type USE and INFDOK level 3. Strat. Thinking


6 or more


7


Low‐Moderate

Response Mode Low

ENGINEERING ITEMS FOR ALIGNMENT


25

Content, cognitive, and linguistic response demands frameworks◦ Select item response demand codes most appropriate for the test and PLDs working with

Empirical support◦ Reliable coding (some of it semi‐automated)◦ Important predictors of difficulty and discrimination◦ R2 = {.15, .30}: Need more work (e.g., additional or better frameworks, learning science lit reviews, OTL question)

Still working out PLD‐item alignment criteria◦ For now, normative criteria

Framework for engineering items



Engineering items to target PLDs


Theoretically, once you know some things about the relationship between item response demands and item difficulty:◦ Create item templates to guide development of more items with similar difficulties◦ Engineer existing items to hit difficulty targets

We’re just getting started◦ Gierl, M. J., & Haladyna, T. M. (Eds.). (2012). Automated item generation: Theory and practice. New York: Routledge.

Engineering items (cont.)


We-are-not-Alone (WANA) Alignment Model

<‐A‐> <‐A‐> <‐A‐>

Infrastructure: Professional development, curriculum materials, instructional approachesPolicy: CCSS, NGSS PLDs: Range PLDsTest Content: Standards targeted, item types, item response demands A = alignment; Coburn, Hill, & Spillane (2016)Enactment and implementation: McDonnell & Weatherford (2016)

Local State

Infrastructure Policy PLDs Test Content

Enactment Implementation

Steve [email protected]

Thanks!


Beck, M. (2003, April). Standard setting: If it is science, it’s sociology and linguistics, not psychometrics. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.

Bejar, I. I., Braun, H. I., & Tannenbaum, R. (2006).A prospective approach to standard setting. Paper presented in Assessing and modeling development in school: Intellectual growth and standard setting, October 19–20, University of Maryland , College Park.

Bejar, I. I., Braun, H. I., & Tannenbaum, R. (2007).A prospective approach to standard setting. In R. Lissitz (Ed.), Assessing and modeling cognitive development in school (pp. ***). Maple Grove, MN: JAM Press.

Burt, W. M., & Stapleton, L. M. (2010). Connotative meanings of student performance labels used in standard setting. Educational Measurement: Issues and Practice, 29(4), 28–38.

Cizek, G. J., & Bunch, M. B. (2007). Standard setting: A guide to establishing and evaluating performance standards on tests. Thousand Oaks, CA: Sage.

Coburn, C. E., Hill, H. C. & Spillane, J. P. Alignment and accountability in policy design and implementation: The Common Core State Standards and implementation research. Educational Researcher, 45(4), 243‐251.

Egan, K. L., Schneider, M. C., & Ferrara, S. (2012). Performance level descriptors: History, practice, and a proposed framework. In G. J. Cizek (Ed.) Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 79‐106). New York: Routledge.

Ferrara, S., Lai, E., Reilly, A., & Nichols. (2017, in press). Principled approaches to assessment design, development, and implementation: Cognition in score interpretation and use. In A. A. Rupp and J. P. Leighton (Eds.), The handbook on cognition and assessment: Frameworks, methodologies, and applications (pp. 41‐74). Malden, MA: Wiley.

References



Ferrara, S., Svetina, D., Skucha, S., & Murphy, A. (2011). Test design with performance standards and achievement growth in mind. Educational Measurement: Issues and Practice, 30(4), 3‐15.

Loomis, S. C., & Bourque, M. L. (2001). From tradition to innovation: Standard setting on the National Assessment of Educational Progress. In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp. 175–217). Mahwah, NJ: Lawrence Erlbaum Associates.

McDonnell, L. M., & Weatherford, M. S. (2016). Recognizing the political in implementation research. Educational Researcher, 45(4), 233‐242.

Mills, C. N., & Jaeger, R. M. (1998). Creating descriptions of desired student achievement when setting performance standards. In L. Hansche (Ed.), Handbook for the development of performance standards: Meeting the requirements of Title I (pp. 73–85). Washington, DC: Council of Chief State School Officers.

Perie, M. (2008). A guide to understanding and developing performance‐level descriptors. Educational Measurement: Issues and Practice, 27(4), 15‐29.

U.S. Department of Education, Office of Elementary and Secondary Education. (2004). Standards and assessments peer review guidance: Information and examples for meeting requirements of the No Child Left Behind Act of 2001. Washington, DC: U.S. Department of Education, Office of Elementary and Secondary Education.

Zieky, M., Perie, M., & Livingston, S. (2008). Cut scores: A manual for setting performance standards on educational and occupational tests. Princeton, NJ: Educational Testing Service.

References (cont.)

Documents

Engineering PLDs and Test Content to Engineer Cut … · 2 Define and illustrate test content‐PLD alignment Principled design and validity argumentation call for this alignment