Study of Device Comparability within the PARCC Field Test

Study of Device Comparabilitywithin the PARCC Field Test

2

PARCC’s ultimate goal for test delivery • Digital delivery of the PARCC ELA and mathematics

assessments – On the widest variety of devices that will support

interchangeable scores– E.g., desktop computers, laptops, and tablets

Goal of Test Delivery

Fairness

3

• “Tablets” = full size (10”) iPads• One form of each of the following tests was chosen

for administration on iPads:– Grade 4 ELA/Literacy and Mathematics – Grade 8 ELA/Literacy and Mathematics– Grade 10 ELA/Literacy– Geometry

• Selected “condition 1” forms so that the same students took both the PBA and EOY components of the selected forms

Quantitative Comparability Study

4

1. Do the individual items/tasks perform similarly across computers and tablets?

2. Are the psychometric properties of the test scores similar across computers and tablets?

3. Do students perform similarly on the overall test across computers and tablets?

Research Questions

Methodology

6

• Grade 8 and high school studies used random assignment of Burlington, MA students to computer and tablet conditions– Random assignment to conditions by homeroom or class

section

• Grade 4 study used matched sample from MA– Burlington students assigned to tablet condition matched

to other MA students who tested on computer– Matching based on previous scores on state assessment,

Massachusetts Comprehensive Assessment System (MCAS)

Data Collection Design

7

• Item/Task Level Analysis– Comparison of p-values and item means– Analysis of IRT item difficulty estimates

• Component Level Analysis– Correlation between PBA and EOY scores

• Test Level Analysis– Reliability– Validity – Score Interpretation

Analysis Methods

Summary of Results

9

• Grade 4 Mathematics– Device effect found for 18 of 51 (35%) items– Elementary students less familiar with taking mathematics

tests online– Degree of success in matching samples for Grade 4

• Grade 8 Mathematics– Device effect found in component-level and reliability

analysis– Highest number of items (29 of 67, or 43%) excluded from

study

Observed Device Effect

10

• Grade 4 ELA– Device effect found in validity and score interpretation

analysis– Elementary students less familiar with taking items/tasks

that are not selected responses online– Degree of success in matching samples for Grade 4

• Consistent device effect across analyses was not observed for any of the tests in the study– Device effect was found for none of the analyses in Grades

8 ELA and Geometry

Observed Device Effect

Conclusions and Implications

12

1. Do the individual items/tasks perform similarly across computers and tablets?o YES, for most items/tasks in the studyo More items with device effect in Grade 4– Unfamiliarity with taking certain item types

online for elementary school students– Degree of success in matched samples

o Insufficient device effect items to draw conclusions about item features

Conclusions

13

2. Are the psychometric properties of the test scores similar across computers and tablets?o YES, for all but one test in this studyo Exception: Grade 8 mathematics (component-

level and reliability analyses)– Highest number of items excluded from study

may have led to less stable correlation estimates

Conclusions

14

3. Do students perform similarly on the overall test across computers and tablets?o In general, YES – no consistent device effect was

observed across analyses for any test in study o Device effect found in score interpretation

analysis for Grade 4 ELA– Unfamiliarity with taking non-selected response

tasks online for elementary school students– Degree of success in matched samples

Conclusions

15

• Comparability of assessments administered on computer and tablets – No evidence of large or consistent differences in

comparability was found in this study– Also supported by device comparability research

conducted outside of PARCC (e.g. Davis, Orr, Kong, Lin, 2014; Olsen, 2014; Davis, Kong, McBride, 2015)

– Further supported by policies in other large scale assessment programs (e.g., SBAC and other statewide assessments)

Implications

16

• Item development and user interface design– Consider familiarity of younger students with

nontraditional item types online– Additional focus groups and/or cognitive labs with

elementary school students – Minimize the use of item features (e.g., drag and

drop) that may lead to differential performance across computers and tablets

Implications

Documents

Study of Device Comparability within the PARCC Field Test