15
Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity Research Agenda 2015 National Conference on Student Assessment Enis Dogan

Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity Research Agenda 2015 National Conference on Student Assessment

Embed Size (px)

Citation preview

Page 1: Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity Research Agenda 2015 National Conference on Student Assessment

Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity

Research Agenda

2015 National Conference on Student Assessment

Enis Dogan

Page 2: Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity Research Agenda 2015 National Conference on Student Assessment

PARCC Overview

• PARCC development and implementation• TAC overview• Determining the agenda

• Research and Psychometrics Committee• Other Working Groups and PARCC State Leads • Guiding documents• Validity framework• Psychometric Roadmap

2

Page 3: Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity Research Agenda 2015 National Conference on Student Assessment

PARCC Overview

• Development initiated in 2011• First research conducted in 2012• Field trial in 2013• Field tests in 2014• First operational assessments in 2014-15 school year in:

3

• Arkansas • Colorado• District of Columbia • Illinois • Louisiana• Maryland

• Massachusetts • Mississippi• New Jersey• New Mexico• Ohio• Rhode Island

Page 4: Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity Research Agenda 2015 National Conference on Student Assessment

• Henry Braun (Boston College)

• Bob Brennan (University of Iowa)

• Derek Briggs (University of Colorado at Boulder)

• Linda Cook (Retired, ETS)

• Ronald Hambleton (University of Massachusetts, Amherst)

• Gerunda Hughes (Howard University)

• Huynh Huynh (University of South Carolina)

• Michael Kolen (University of Iowa)

PARCC TAC

• Suzanne Lane (University of Pittsburgh)

• Richard Luecht (University of North Carolina at Greensboro)

• Jim Pellegrino (University of Illinois at Chicago)

• Barbara Plake (University of Nebraska- Lincoln)

• Rachel Quenemoen (National Center on Educational Outcomes)

• Laurie Wise (Human Resources Research Organization, HumRRO)

Provides guidance on assessment design and development, and the research agenda of the consortium

Page 5: Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity Research Agenda 2015 National Conference on Student Assessment

Determining the agenda

• Research and Psychometrics Committee• Other Working Groups and PARCC State Leads • Guiding documents• Psychometric Roadmap• Validity framework

5

Page 6: Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity Research Agenda 2015 National Conference on Student Assessment

• Lists psychometric assumptions and decisions and provides a road map for making decisions on pending issues. • The psychometric issues are categorizes as follows• PARCC Scaling Approach and Reporting Scale Characteristics• Claims and Subclaims Reporting• Scale Construction and Properties • Item Response Theory (IRT) Modeling• Mode and Device Comparability• Data Forensics• Linking Considerations

Psychometric Roadmap

Page 7: Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity Research Agenda 2015 National Conference on Student Assessment

Psychometric Workplan Issues

• Determine properties of the primary (summative) reporting scale• Determine number of digits for reported scale scores• Establish rules defining the lowest and highest reported scale scores• Determine how cut scores will be reported across performance levels, grades and subjects (i.e., determine

scale anchors)• Determine how transformations from raw scores to scale scores will be carried out

Assumptions and Decisions

• CCR cut score will be fixed so that the same value indicates CCR performance across all grades and content areas

References for Assumptions/Decisions

• PARCC Scale Score Brief (Sept 2014): “PARCC Score Scale Brief_090314.docx”

Outstanding Questions• What is the range of the summative scale scores? (Evaluating the CSEMs may help inform this decision)• What are the lowest and highest obtainable scale scores (LOSS and HOSS) for the summative scores?• What are the LOSS and HOSS for the sub-scores in reading and writing?

Psychometric Roadmap

Page 8: Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity Research Agenda 2015 National Conference on Student Assessment

Task Deadline

Policy decision about number of digits for PARCC reporting scales 3/31/2015

PARCC scaling approach presentation and discussion with RAP 4/2/2015

Scaling approach discussion at research planning meeting 4/29/2015

Scaling consideration presentation to TAC 6/17/2015

Policy decision about the properties of the reading and writing sub-score scales 6/30/2015

Simulations with spring 2015 operational data 7/27/2015

Performance level setting meeting for high school 7/27/2015

Governing Board approves standards and summative scales for high school 8/14/2015

Performance level setting meeting for grades 3-8 8/24/2015

Governing Board approves standards and summative scales for grades 3-8 9/11/2015

Psychometric Roadmap

Page 9: Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity Research Agenda 2015 National Conference on Student Assessment

• In terms of planning and executing research studies to collect empirical validity evidence, the first step is to build and follow a framework which the studies can be organized around. Lack of connectedness among validity studies is a challenge in many assessment programs (Haladyna, 2006).• We built our framework by first dividing the assessment development

and implementation period into four phases:• Phase I: Defining measurement targets, item and test development• Phase II: Test delivery and administration • Phase III: Scoring, scaling, standard setting• Phase IV: Reporting, interpretation and use of results

Validity Framework

Page 10: Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity Research Agenda 2015 National Conference on Student Assessment

• Phase I: Defining measurement targets, item and test development• 1-A: The purposes of the assessments are clear to all stake holders.

Relevant standards: 1.1• 1-B: Test specifications and design documents are clear about what knowledge

and skills are able to be assessed, the scope of the domain, the definition of competence, and the claims the assessments will be used to support.

Relevant standards: 1.2, 3.1, 3.3• 1-C: Items are free of bias and accessible.

Relevant standards: 7.4, 7.7, 9.1, 9.2, 10.1• 1-D: Items measure the intended constructs and elicit behavior that can be used

as evidence in supporting the intended claims.Relevant standards: 1.1, 1.8, 13.3

Validity Framework

Page 11: Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity Research Agenda 2015 National Conference on Student Assessment

• Phase I: Defining measurement targets, item and test development

Sources/Evidence of Procedural Validity for Phase I • Performance-Level Descriptors (PLDs) • Supported conditions/outcome: 1-B (scope of domain)

Sources/Evidence of Empirical Validity for Phase I• Study 4: Use of Evidence-Based Selected Response Items in Measuring Reading

Comprehension o Supported conditions/outcome: 1-D (intended constructs)o Source of validity evidence: Response processes

Validity Framework

Page 12: Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity Research Agenda 2015 National Conference on Student Assessment

PARCC Validity Framework is described in more detail in

Dogan, E. & Hauger, J.(in press). Empirical and Procedural Validity Evidence in Development and Implementation of PARCC Assessments. In Lissitz, R. W. (Editor), The Next Generation of Testing: Common Core Standards, Smarter-Balanced, PARCC, and the Nationwide Testing Movement. Charlotte: Information Age Publishing Inc.

Validity Evidence

12

Page 13: Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity Research Agenda 2015 National Conference on Student Assessment

• Evidence and Design Implications Required to Support Comparability Claims by Richard M. Luecht (The University of North Carolina at Greensboro) and Wayne J. Camara (The College Board)

• Combining Multiple Indicators by Lauress L. Wise (HumRRO)

• Issues Associated with Vertical Scales for PARCC Assessments by Michael J. Kolen (The University of Iowa)

• Making Inferences about Growth and Value-Added: Design Issues for the PARCC Consortium by Derek Briggs (University of Colorado at Boulder)

• Defining and Measuring College and Career Readiness and Informing the Development of Performance Level Descriptors (PLDs) by Wayne Camara (College Board) and Rachel Quenemoen (National Center on Educational Outcomes)

• Scores and Scales: Considerations for PARCC Assessments by Michael J. Kolen (University of Iowa)

• Scaling PARCC Assessments: Some Considerations and a Synthetic Data Example by Robert L. Brennan (University of Iowa)

PARCC TAC

Page 14: Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity Research Agenda 2015 National Conference on Student Assessment

TAC Webinars in 2015

February 2015

• Field Test Analyses

March 2015

• Device Comparability Study

April 2015

• IRT Analyses• Mode

Comparability Study

May 2015

• Scale Properties

• International Benchmarking Study (Content Alignment)

• Data Forensics• End-of-Course

Comparability Study

• PARCC Test Design Change

June 2015

• PARCC CCR Policy

14