Upload
lykiet
View
214
Download
0
Embed Size (px)
Citation preview
OR Aligning Test Development with Intended Score Interpretations
Steve Ferrara
A presentation in Engineered Cut Scores: Aligning Standard Setting Methodology with Contemporary Assessment Design Principles, a session in the National Conference on Student AssessmentJune 22, 2016
Engineering PLDs and Test Content to Engineer Cut Scores
2
Define and illustrate test content‐PLD alignment
Principled design and validity argumentation call for this alignment
Engineering PLDs
Engineering test forms
Engineering items
Concept not new; engineering PLDs, test forms, and items is WIP
Overview
Engineered Cut Scores NCSA Session
3
Need a broader conception of alignment to engineer cut scores
Proposed: Alignment is the degree to which an item’s response demands are consistent with the knowledge and skill requirements described in the corresponding PLD
Aside: Need a broader type of alignment to provide evidence to support intended score interpretations and uses, as well
PremiseResponse demands: Content, cognitive, and linguistic features of items that are related to item difficulty and quality (i.e., discrimination)
Response demands: Content, cognitive, and linguistic features of items that are related to item difficulty and quality (i.e., discrimination)
Engineered Cut Scores NCSA Session
4
What is test content-PLD alignment?
PARCC PLDs, grade 6 Reading; http://www.parcconline.org/assessments/test‐design/ela‐literacy/ela‐performance‐level‐descriptors
Engineered Cut Scores NCSA Session
AlignedAlignedHigh
xxx Item Response Demandsxxx Content, Cognitive, Linguisticxxx Consistent with Level 4 KSAsxxx Item Response Demandsxxx Content, Cognitive, Linguisticxxx Consistent with Level 3 KSAsxxx Item Response Demandsxxx Content, Cognitive, Linguisticxxx Consistent with Level 2 KSAs
Low
Level 4: Approaches Expectations for assessed contentVery complex text: General accuracy and understandingModerately complex text: General accuracy and understandingReadily accessible text: Mostly accurate analyses, showing understanding
Level 3: Approaches Expectations for assessed contentVery complex text: Minimal accuracy and understandingModerately complex text: General accuracy, basic understandingReadily accessible text: Mostly accurate analyses, showing understanding
Level 2: Partially Meets Expectations for assessed standardsVery complex text: Inaccurate analysis, limited understandingModerately complex text: Minimal accuracy and understandingReadily accessible text: Partial accuracy and understanding
ExpectationsLevel 2
Partially MeetsExpectations
Level 4Meets
ExpectationsLevel 3
Approaches
ArticulatedArticulated
5
Test content-PLD alignment and articulation not automatic
MisalignmentMisalignment
MisalignmentMisalignment
MisalignmentMisalignment
MisalignmentMisalignment
Ferrara, Svetina, Skucha, & Davidson, 2011
Alignment within and
across grades
Alignment within and
across grades
Engineered Cut Scores NCSA Session
Engineered Cut Scores NCSA Session 6
Linking reading comprehension items to the Common European Frame of Reference (CEFR)
Figueras, N., Kaftandjieva, F., & Takala, S. (2013)
Several names, several conceptualizations Common elements, varying details (Ferrara, Lai, Reilly, & Nichols, 2016)◦ Evidence Centered Design (ECD)
◦ Assessment Engineering (AE)
◦ Cognitive Design Systems (CDS)
◦ BEAR Assessment System
◦ Principled Design for Efficacy (PDE)
Principled approaches to assessment design, development, and implementation
Engineered Cut Scores NCSA Session
Principled approaches: Design (etc.) for alignment
Table 1 Foundation and Organizing Elements of Principled Approaches to Assessment Design, Development, and Implementation and their Relationship to the Assessment Triangle
Framework Elements Assessment Triangle Alignment
Organizing Element
Ongoing accumulation of evidence to support validity arguments Overall evidentiary reasoning goal
Foundational Elements
Clearly defined assessment targets Cognition
Statement of intended score interpretations and uses Cognition
Model of cognition, learning, or performance Cognition
Aligned measurement models and reporting scales Interpretation
Manipulation of assessment activities to align with assessment targets and intended score interpretations and uses
Observation
From Ferrara, Lai, Reilly, & Nichols (2017)
Principled approaches are Assessment Engineering, BEAR Assessment System, Cognitive Design System, Evidence Centered Design, and Principled Design for Efficacy
Intended score interpretationsIntended score interpretations
AlignmentAlignment
9Engineered Cut Scores NCSA Session
Logic of inferences
Ferrara, Lai, Reilly, & Nichols, 2016, Fig. 1
PLDs
Everything else, including items, aligned
to PLDs
Engineered Cut Scores NCSA Session
Framework for developing PLDs
From Egan, Schneider, & Ferrara, 2012, Table 2
Engineered Cut Scores NCSA Session
13
Develop PLDs to guide test development, articulate standards across grades, link to learning trajectories (Bejar, Braun, & Tannenbaum, 2006, 2007)
Policy definitions
◦ Generic, by policy makers (Loomis & Bourque, 2001; Perie, 2008)
◦ Number of levels; labels and meaning (Beck, 2003; Burt & Stapleton, 2010; Cizek & Bunch, 2007; Egan, Schneider, & Ferrara, 2012; Perie, 2008; Zieky, Perie, & Livingston, 2008)
Range and other PLDs
◦ Explicit about content knowledge (Egan et al., 2012; Mills & Jaeger, 1998; Perie, 2008; US Department of Education 2004)
◦ Explicit about cognitive processes (Egan et al., 2012; Perie, 2008)
◦ Nouns and verbs, defining phrases (Egan et al., 2012)
Would like to find guidance on working from policy PLDs to content standards to range PLDs
Guidance on engineering PLDs
Engineered Cut Scores NCSA Session
Code items for response demands Determine which items are aligned with the KSA requirements in the corresponding PLD
Build aligned test forms◦ Select only aligned items
Item development◦ Develop item specifications and item writer training to improve alignment◦ Re‐field test items that are not aligned to PLDs
How can we pursue alignment?
15Engineered Cut Scores NCSA Session
WIP: Next study, I hope
WIP: Next study, I hope
17
Question Type◦ The cognitive task an item poses (e.g., explain, analyze)
Depth of Knowledge◦ Recall, Skill/concept, Strategic thinking
Relational Complexity◦ No. of facts, concepts, and skills to be processed
Linguistic Complexity◦ No. of prepositional phrases, as a proxy
Command of Textual Evidence◦ Single, multiple pieces of text
Response Mode◦ Select response, multiple responses, construct responses
Selected item response demand codes
Engineered Cut Scores NCSA Session
Excerpt from Of Fat, Feathers…; item 3874 Finger
here
Finger here
Finger here
Engineered Cut Scores NCSA Session
19
Item 3874
Item Response DemandsQuestion Type USEDOK level 2. Skill/conceptRelational Complexity
5
Number of Prepositions
5
Command of Textual Evidence
Low
Response Mode Low
PLD Alignment TargetModerate Complexity text: Minimal accuracy and understandingDo correct responses to this item support this intended interpretation(or claim)?
Engineered Cut Scores NCSA Session
21
Item 3182
Item Response DemandsQuestion Type USE and INFDOK level 3. Strat. Thinking
Relational Complexity
6 or more
Number of Prepositions
7
Command of Textual Evidence
Low‐Moderate
Response Mode Low
PLD Alignment TargetReadily Accessible text: Partial accuracy and understandingDo correct responses to this item support this intended interpretation(or claim)?
Engineered Cut Scores NCSA Session
Item Response DemandsQuestion Type USEDOK level 2. Skill/conceptRelational Complexity
5
Number of Prepositions
5
Command of Textual Evidence
Low
Response Mode Low
22
Excerpt from A Single Shard: item 3175
Prior knowledge of “arid” might help, but “arid bones” is figurative language.
Engineered Cut Scores NCSA Session
“Arid” seems pretty clear; “cold,”, “long,” and “rotten” seem unlikely. So what
makes this relatively difficult?
23
Item 3175
Item Response DemandsQuestion Type USE or INFDOK level 2. Skill/concept
Relational Complexity
5 or more
Number of Prepositions
4
Command of Textual Evidence
Low
Response Mode Low
PLD Alignment TargetVery complex text: Inaccurate analysis, limited understandingDo correct responses to this item support this intended interpretation(or claim)?
Engineered Cut Scores NCSA Session
Item Response DemandsQuestion Type USE and INFDOK level 3. Strat. Thinking
Relational Complexity
6 or more
Number of Prepositions
7
Command of Textual Evidence
Low‐Moderate
Response Mode Low
25
Content, cognitive, and linguistic response demands frameworks◦ Select item response demand codes most appropriate for the test and PLDs working with
Empirical support◦ Reliable coding (some of it semi‐automated)◦ Important predictors of difficulty and discrimination◦ R2 = {.15, .30}: Need more work (e.g., additional or better frameworks, learning science lit reviews, OTL question)
Still working out PLD‐item alignment criteria◦ For now, normative criteria
Framework for engineering items
Engineered Cut Scores NCSA Session
Engineered Cut Scores NCSA Session 27
Theoretically, once you know some things about the relationship between item response demands and item difficulty:◦ Create item templates to guide development of more items with similar difficulties◦ Engineer existing items to hit difficulty targets
We’re just getting started◦ Gierl, M. J., & Haladyna, T. M. (Eds.). (2012). Automated item generation: Theory and practice. New York: Routledge.
Engineering items (cont.)
Engineered Cut Scores NCSA Session 28
We-are-not-Alone (WANA) Alignment Model
<‐A‐> <‐A‐> <‐A‐>
Infrastructure: Professional development, curriculum materials, instructional approachesPolicy: CCSS, NGSS PLDs: Range PLDsTest Content: Standards targeted, item types, item response demands A = alignment; Coburn, Hill, & Spillane (2016)Enactment and implementation: McDonnell & Weatherford (2016)
Local State
Infrastructure Policy PLDs Test Content
Enactment Implementation
Beck, M. (2003, April). Standard setting: If it is science, it’s sociology and linguistics, not psychometrics. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.
Bejar, I. I., Braun, H. I., & Tannenbaum, R. (2006).A prospective approach to standard setting. Paper presented in Assessing and modeling development in school: Intellectual growth and standard setting, October 19–20, University of Maryland , College Park.
Bejar, I. I., Braun, H. I., & Tannenbaum, R. (2007).A prospective approach to standard setting. In R. Lissitz (Ed.), Assessing and modeling cognitive development in school (pp. ***). Maple Grove, MN: JAM Press.
Burt, W. M., & Stapleton, L. M. (2010). Connotative meanings of student performance labels used in standard setting. Educational Measurement: Issues and Practice, 29(4), 28–38.
Cizek, G. J., & Bunch, M. B. (2007). Standard setting: A guide to establishing and evaluating performance standards on tests. Thousand Oaks, CA: Sage.
Coburn, C. E., Hill, H. C. & Spillane, J. P. Alignment and accountability in policy design and implementation: The Common Core State Standards and implementation research. Educational Researcher, 45(4), 243‐251.
Egan, K. L., Schneider, M. C., & Ferrara, S. (2012). Performance level descriptors: History, practice, and a proposed framework. In G. J. Cizek (Ed.) Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 79‐106). New York: Routledge.
Ferrara, S., Lai, E., Reilly, A., & Nichols. (2017, in press). Principled approaches to assessment design, development, and implementation: Cognition in score interpretation and use. In A. A. Rupp and J. P. Leighton (Eds.), The handbook on cognition and assessment: Frameworks, methodologies, and applications (pp. 41‐74). Malden, MA: Wiley.
References
30Engineered Cut Scores NCSA Session
Engineered Cut Scores NCSA Session 31
Ferrara, S., Svetina, D., Skucha, S., & Murphy, A. (2011). Test design with performance standards and achievement growth in mind. Educational Measurement: Issues and Practice, 30(4), 3‐15.
Loomis, S. C., & Bourque, M. L. (2001). From tradition to innovation: Standard setting on the National Assessment of Educational Progress. In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp. 175–217). Mahwah, NJ: Lawrence Erlbaum Associates.
McDonnell, L. M., & Weatherford, M. S. (2016). Recognizing the political in implementation research. Educational Researcher, 45(4), 233‐242.
Mills, C. N., & Jaeger, R. M. (1998). Creating descriptions of desired student achievement when setting performance standards. In L. Hansche (Ed.), Handbook for the development of performance standards: Meeting the requirements of Title I (pp. 73–85). Washington, DC: Council of Chief State School Officers.
Perie, M. (2008). A guide to understanding and developing performance‐level descriptors. Educational Measurement: Issues and Practice, 27(4), 15‐29.
U.S. Department of Education, Office of Elementary and Secondary Education. (2004). Standards and assessments peer review guidance: Information and examples for meeting requirements of the No Child Left Behind Act of 2001. Washington, DC: U.S. Department of Education, Office of Elementary and Secondary Education.
Zieky, M., Perie, M., & Livingston, S. (2008). Cut scores: A manual for setting performance standards on educational and occupational tests. Princeton, NJ: Educational Testing Service.
References (cont.)