Upload
hugo-hardy
View
212
Download
0
Embed Size (px)
Citation preview
Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity
Research Agenda
2015 National Conference on Student Assessment
Enis Dogan
PARCC Overview
• PARCC development and implementation• TAC overview• Determining the agenda
• Research and Psychometrics Committee• Other Working Groups and PARCC State Leads • Guiding documents• Validity framework• Psychometric Roadmap
2
PARCC Overview
• Development initiated in 2011• First research conducted in 2012• Field trial in 2013• Field tests in 2014• First operational assessments in 2014-15 school year in:
3
• Arkansas • Colorado• District of Columbia • Illinois • Louisiana• Maryland
• Massachusetts • Mississippi• New Jersey• New Mexico• Ohio• Rhode Island
• Henry Braun (Boston College)
• Bob Brennan (University of Iowa)
• Derek Briggs (University of Colorado at Boulder)
• Linda Cook (Retired, ETS)
• Ronald Hambleton (University of Massachusetts, Amherst)
• Gerunda Hughes (Howard University)
• Huynh Huynh (University of South Carolina)
• Michael Kolen (University of Iowa)
PARCC TAC
• Suzanne Lane (University of Pittsburgh)
• Richard Luecht (University of North Carolina at Greensboro)
• Jim Pellegrino (University of Illinois at Chicago)
• Barbara Plake (University of Nebraska- Lincoln)
• Rachel Quenemoen (National Center on Educational Outcomes)
• Laurie Wise (Human Resources Research Organization, HumRRO)
Provides guidance on assessment design and development, and the research agenda of the consortium
Determining the agenda
• Research and Psychometrics Committee• Other Working Groups and PARCC State Leads • Guiding documents• Psychometric Roadmap• Validity framework
5
• Lists psychometric assumptions and decisions and provides a road map for making decisions on pending issues. • The psychometric issues are categorizes as follows• PARCC Scaling Approach and Reporting Scale Characteristics• Claims and Subclaims Reporting• Scale Construction and Properties • Item Response Theory (IRT) Modeling• Mode and Device Comparability• Data Forensics• Linking Considerations
Psychometric Roadmap
Psychometric Workplan Issues
• Determine properties of the primary (summative) reporting scale• Determine number of digits for reported scale scores• Establish rules defining the lowest and highest reported scale scores• Determine how cut scores will be reported across performance levels, grades and subjects (i.e., determine
scale anchors)• Determine how transformations from raw scores to scale scores will be carried out
Assumptions and Decisions
• CCR cut score will be fixed so that the same value indicates CCR performance across all grades and content areas
References for Assumptions/Decisions
• PARCC Scale Score Brief (Sept 2014): “PARCC Score Scale Brief_090314.docx”
Outstanding Questions• What is the range of the summative scale scores? (Evaluating the CSEMs may help inform this decision)• What are the lowest and highest obtainable scale scores (LOSS and HOSS) for the summative scores?• What are the LOSS and HOSS for the sub-scores in reading and writing?
Psychometric Roadmap
Task Deadline
Policy decision about number of digits for PARCC reporting scales 3/31/2015
PARCC scaling approach presentation and discussion with RAP 4/2/2015
Scaling approach discussion at research planning meeting 4/29/2015
Scaling consideration presentation to TAC 6/17/2015
Policy decision about the properties of the reading and writing sub-score scales 6/30/2015
Simulations with spring 2015 operational data 7/27/2015
Performance level setting meeting for high school 7/27/2015
Governing Board approves standards and summative scales for high school 8/14/2015
Performance level setting meeting for grades 3-8 8/24/2015
Governing Board approves standards and summative scales for grades 3-8 9/11/2015
Psychometric Roadmap
• In terms of planning and executing research studies to collect empirical validity evidence, the first step is to build and follow a framework which the studies can be organized around. Lack of connectedness among validity studies is a challenge in many assessment programs (Haladyna, 2006).• We built our framework by first dividing the assessment development
and implementation period into four phases:• Phase I: Defining measurement targets, item and test development• Phase II: Test delivery and administration • Phase III: Scoring, scaling, standard setting• Phase IV: Reporting, interpretation and use of results
Validity Framework
• Phase I: Defining measurement targets, item and test development• 1-A: The purposes of the assessments are clear to all stake holders.
Relevant standards: 1.1• 1-B: Test specifications and design documents are clear about what knowledge
and skills are able to be assessed, the scope of the domain, the definition of competence, and the claims the assessments will be used to support.
Relevant standards: 1.2, 3.1, 3.3• 1-C: Items are free of bias and accessible.
Relevant standards: 7.4, 7.7, 9.1, 9.2, 10.1• 1-D: Items measure the intended constructs and elicit behavior that can be used
as evidence in supporting the intended claims.Relevant standards: 1.1, 1.8, 13.3
Validity Framework
• Phase I: Defining measurement targets, item and test development
Sources/Evidence of Procedural Validity for Phase I • Performance-Level Descriptors (PLDs) • Supported conditions/outcome: 1-B (scope of domain)
Sources/Evidence of Empirical Validity for Phase I• Study 4: Use of Evidence-Based Selected Response Items in Measuring Reading
Comprehension o Supported conditions/outcome: 1-D (intended constructs)o Source of validity evidence: Response processes
Validity Framework
PARCC Validity Framework is described in more detail in
Dogan, E. & Hauger, J.(in press). Empirical and Procedural Validity Evidence in Development and Implementation of PARCC Assessments. In Lissitz, R. W. (Editor), The Next Generation of Testing: Common Core Standards, Smarter-Balanced, PARCC, and the Nationwide Testing Movement. Charlotte: Information Age Publishing Inc.
Validity Evidence
12
• Evidence and Design Implications Required to Support Comparability Claims by Richard M. Luecht (The University of North Carolina at Greensboro) and Wayne J. Camara (The College Board)
• Combining Multiple Indicators by Lauress L. Wise (HumRRO)
• Issues Associated with Vertical Scales for PARCC Assessments by Michael J. Kolen (The University of Iowa)
• Making Inferences about Growth and Value-Added: Design Issues for the PARCC Consortium by Derek Briggs (University of Colorado at Boulder)
• Defining and Measuring College and Career Readiness and Informing the Development of Performance Level Descriptors (PLDs) by Wayne Camara (College Board) and Rachel Quenemoen (National Center on Educational Outcomes)
• Scores and Scales: Considerations for PARCC Assessments by Michael J. Kolen (University of Iowa)
• Scaling PARCC Assessments: Some Considerations and a Synthetic Data Example by Robert L. Brennan (University of Iowa)
PARCC TAC
TAC Webinars in 2015
February 2015
• Field Test Analyses
March 2015
• Device Comparability Study
April 2015
• IRT Analyses• Mode
Comparability Study
May 2015
• Scale Properties
• International Benchmarking Study (Content Alignment)
• Data Forensics• End-of-Course
Comparability Study
• PARCC Test Design Change
June 2015
• PARCC CCR Policy
14