Upload
martin-golden
View
213
Download
0
Embed Size (px)
Citation preview
Study of Device Comparabilitywithin the PARCC Field Test
2
PARCC’s ultimate goal for test delivery • Digital delivery of the PARCC ELA and mathematics
assessments – On the widest variety of devices that will support
interchangeable scores– E.g., desktop computers, laptops, and tablets
Goal of Test Delivery
Fairness
3
• “Tablets” = full size (10”) iPads• One form of each of the following tests was chosen
for administration on iPads:– Grade 4 ELA/Literacy and Mathematics – Grade 8 ELA/Literacy and Mathematics– Grade 10 ELA/Literacy– Geometry
• Selected “condition 1” forms so that the same students took both the PBA and EOY components of the selected forms
Quantitative Comparability Study
4
1. Do the individual items/tasks perform similarly across computers and tablets?
2. Are the psychometric properties of the test scores similar across computers and tablets?
3. Do students perform similarly on the overall test across computers and tablets?
Research Questions
Methodology
6
• Grade 8 and high school studies used random assignment of Burlington, MA students to computer and tablet conditions– Random assignment to conditions by homeroom or class
section
• Grade 4 study used matched sample from MA– Burlington students assigned to tablet condition matched
to other MA students who tested on computer– Matching based on previous scores on state assessment,
Massachusetts Comprehensive Assessment System (MCAS)
Data Collection Design
7
• Item/Task Level Analysis– Comparison of p-values and item means– Analysis of IRT item difficulty estimates
• Component Level Analysis– Correlation between PBA and EOY scores
• Test Level Analysis– Reliability– Validity – Score Interpretation
Analysis Methods
Summary of Results
9
• Grade 4 Mathematics– Device effect found for 18 of 51 (35%) items– Elementary students less familiar with taking mathematics
tests online– Degree of success in matching samples for Grade 4
• Grade 8 Mathematics– Device effect found in component-level and reliability
analysis– Highest number of items (29 of 67, or 43%) excluded from
study
Observed Device Effect
10
• Grade 4 ELA– Device effect found in validity and score interpretation
analysis– Elementary students less familiar with taking items/tasks
that are not selected responses online– Degree of success in matching samples for Grade 4
• Consistent device effect across analyses was not observed for any of the tests in the study– Device effect was found for none of the analyses in Grades
8 ELA and Geometry
Observed Device Effect
Conclusions and Implications
12
1. Do the individual items/tasks perform similarly across computers and tablets?o YES, for most items/tasks in the studyo More items with device effect in Grade 4– Unfamiliarity with taking certain item types
online for elementary school students– Degree of success in matched samples
o Insufficient device effect items to draw conclusions about item features
Conclusions
13
2. Are the psychometric properties of the test scores similar across computers and tablets?o YES, for all but one test in this studyo Exception: Grade 8 mathematics (component-
level and reliability analyses)– Highest number of items excluded from study
may have led to less stable correlation estimates
Conclusions
14
3. Do students perform similarly on the overall test across computers and tablets?o In general, YES – no consistent device effect was
observed across analyses for any test in study o Device effect found in score interpretation
analysis for Grade 4 ELA– Unfamiliarity with taking non-selected response
tasks online for elementary school students– Degree of success in matched samples
Conclusions
15
• Comparability of assessments administered on computer and tablets – No evidence of large or consistent differences in
comparability was found in this study– Also supported by device comparability research
conducted outside of PARCC (e.g. Davis, Orr, Kong, Lin, 2014; Olsen, 2014; Davis, Kong, McBride, 2015)
– Further supported by policies in other large scale assessment programs (e.g., SBAC and other statewide assessments)
Implications
16
• Item development and user interface design– Consider familiarity of younger students with
nontraditional item types online– Additional focus groups and/or cognitive labs with
elementary school students – Minimize the use of item features (e.g., drag and
drop) that may lead to differential performance across computers and tablets
Implications