29
Differential item functioning on the Desired Results Developmental Profile Assessment for preschool students with disabilities: what can (and should) we do about it? Joshua Sussman Postdoctoral Scholar Berkeley Evaluation and Assessment Research (BEAR) Center University of California, Berkeley

Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Differential item functioning on the Desired Results Developmental Profile Assessment for preschool students with

disabilities: what can (and should) we do about it?

Joshua Sussman

Postdoctoral Scholar

Berkeley Evaluation and Assessment Research (BEAR) Center

University of California, Berkeley

Page 2: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Outline

• Measurement invariance for students with disabilities• The case of the DRDP (an observational, formative measure of

early childhood development)• DIF analysis and interpretation• What should we do about DIF?

Page 3: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Fairness in assessment

• Measurement invariance is cast as an issue of fairness.• Comparable measurement across groups for fair and unbiased decision

making (e.g., identical cut scores)

• “Disabilities can make it difficult for students to engage in the intended test response processes, leading to test scores that do not reflect their underlying skills (Bolt & Ysseldike, 2008).”

• Special education eligibility for preschoolers (2004-2008) from NCES):• ~50% speech and language

• ~25% unspecified developmental disability

• ~6-7% Autism diagnosis

Page 4: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Desired Results Developmental Profile (DRDP)

• Multidimensional assessment of early childhood development• Attention to Learning and Self Regulation (ATL-REG)• Social and Emotional Development (SED)• Cognitive Development: Math and Science (COG)• Language and Literacy Development (LLD)• Physical Development and Health. (PDHLTH)

• Observational measure• A strengths-based formative assessment – not for sorting but for

increasing opportunity to learn.

Page 5: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Differential Item Functioning (DIF)• DIF methods have been used to study measurement invariance for

students with disabilities (Scarapati, Wells, Lewis, & Jirka, 2011).

• Different DIF methods include CTT methods (contingency table) and IRT-based approaches (Millsap & Everson, 1993)

• Common characteristics of some DIF studies (Ferne & Rupp, 2007): • Single method (sensitive to uniform DIF)• Aim for homogenous grouping variables• Matching on estimated latent variable (IRT studies)• Report on model and item fit (unidimensionality, conditional

independence, etc.).• Interpret statistical and practical significance of DIF

Page 6: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

This study: Methods

• Unidimensional IRT-based approach for DIF detection• Masters (1982) partial credit model (PCM)

• Dimensionality examined previously

• Package ‘TAM’ in R (Robitzsch, Kiefer, & Wu, 2018)• !"#$ − &'#( + !"#$ ∗ &'#( + !"#$ ∗ &"#' ∗ &'#(

• N= 135,946 children enrolled in California preschools

• Facet model• Focal group: eligible for special education (n = 9,258)

• Reference: General education (n = 126,688)

Page 7: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Exploring the Rasch PCM (No DIF)

WLE reliability = 0.945

Wright mapProficiency distribution (WLE)

Conditional SEM

Page 8: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Results: DIF Model

• Model fit: LR test for PCM vs. DIF PCM

ATL-REG SED COG LLD PD-HLTH

Chi2 1215 2172 2324 4895 2220

Df 28 35 64 69 73

p < 0.000 < 0.000 <0.000 <0.000 <0.000

• Similar story for AIC, BIC, CAIC, etc…

Page 9: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Results: Item Fit (under construction)

Page 10: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Estimated differences in group proficiency between special and and general education students (Logits)

Dimension Difference SE Difference*2ATL-REG 0.285 0.002 0.571SED 0.490 0.002 0.981COG 0.352 0.001 0.705LLD 0.356 0.001 0.711PD-HLTH -0.021 0.001 -0.042

Positive values indicate higher ability in the general education group

Page 11: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Difference between Sped and Non-sped

Easier for those in special

education

Harder for those in special

education

(under construction– add effect sizes from Paek & Wilson)

Page 12: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Highest-DIF item (easier for students in special education)

• Established link between language disorder (50% of the SpEd sample) and difficulties with attention and executive functioning (Meuller& Tomblin, 2012)

Item infit = 0.93

Page 13: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Highest positive-DIF item (harder for Sped students)

• Fine motor delays are a common sequelae among students with a variety of disabilities, including difficulties with speech and language (Brookman, Macdonald, Macdonald, & Bishop, 2013)

Item infit = 0.97

Page 14: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

What to do about items displaying DIF?

• Remove• Politically difficult

• May impact the construct

• Revise• Change the item prompt or anchors (similar issues as Removal)

• Change the rater training

• Leave• Additional psychometric development

• Produce different tests for different groups (construct representation issues)

• Model dimensionality, rater effects, other covariates

Page 15: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Thanks!

Page 16: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Appendix

Page 17: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

No step DIF model

Page 18: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Item*sped*step

Page 19: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

DIF

• For test administrations to be considered valid for all student groups, there must be comparable measurement across groups. This can ensure that decisions based on test results are made in a fair manner for all students.

• examinations of score comparability across various student groups and various testing conditions are considered an important piece of evidence suggesting that a test will lead to fair and unbiased decision making (AERA, APA, & NCME, 1999; Braden & Niebling, 2006)

• Several empirical studies of accommodations provided to students with physical disabilities were conducted in the 1980s using this approach (Bennett, Rock, & Jirele, 1987; Bennett, Rock, & Kaplan, 1987; Bennett, Rock, & Novatkoski, 1989; Rogers, 1983). In general, these researchers found that accommodated test administrations for students with sensory/physical disabilities tended to show limited DIF. Recently, several research teams have been using this approach to examine the validity of accommodations for students with mental disabilities, some using factor analysis to examine measurement comparability (Huynh & Barton, 2006; Pomplun & Omar, 2000), others using analysis of DIF (Bolt & Ysseldyke, 2006; Lewis, Green, & Miller, 1999), with results varying in terms of the extent to which measurement comparability has been identified.

Page 20: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

• Examining sped DIF• Why dif matters for Sped• DRDP Ax early Cx Dev• Sped in preschoolers in particular

• 50% SLI, 25% “developmental disability,” 6% Autism (NCER)

• Dif methods in the literature.• DIF Ax• Results= DIF

• Model fit• Item fit = OK• DIF plot

• Interpreting the DIF• Pros and cons

• What to do about the DIF• This is where it gets real– these Ax are in use right now.

Page 21: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

DRDP and students with disabilities

• What is DRDP• Students with disabilities.. • DRDP supports students.. Students with disabilities too

Page 22: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Students with disabilities

• “Disabilities can make it difficult for students to engage in the intended test response processes, leading to test scores that do not reflect their underlying skills. “ X & Ysseldyke, • Preschool eligibility is typically for severe issues.

Page 23: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Methods

• Sample • N=• Sped n

Page 24: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Equation

• Equation 1 represents the baseline comparison model and includes only the category threshold and a slope term for the theta estimate. DIF was evaluated by comparing the fit (−2 log likelihood) between Equation 1 and Equation 2 (uniform DIF) and between Equations 2 and 3 (nonuniform DIF). If the differences in model fit were statistically significant, then DIF was detected.

Page 25: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Evaluation

• Model fit• Effect sizes were also considered when determining the

meaningfulness of DIF.

Page 26: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

results

• n/% of items (steps?) with small DIF• n/% with moderate, large

• n/% of items with fit.

Page 27: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Purify

• The impact of DIF was evaluated by comparing risk estimates (i.e., person theta estimates) from a model including all items and a model adjusted for DIF. The method of adjusting for DIF is sometimes referred to as purifying or resolving items that exhibit DIF (Zumbo, 1999). To purify an item that exhibits DIF, the model was adjusted to include group-specific IRT parameters for those items (i.e., separate item characteristic curves were estimated for each group). The resulting purified IRT theta estimates were anchored to the non-DIF items providing an estimate of risk that is free of DIF. Theta estimates for the two models were equated

• To test the individual-level impact of DIF, each student’s naive theta estimates (from the model based on all items) was compared with their purified theta estimate. Differences in theta estimates greater than the median standard error (of the naive theta estimates) were considered to represent meaningful individual-level impact of DIF (Choi et al., 2011). Differences in theta estimates were also compared with each individual student’s naive standard error (i.e., uncertainty of initial score)…

• ALSO SHOW CATEGORIES

Page 28: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

Good development

• Working with EESD

Page 29: Differential item functioning on the Desired Results Developmental Profile Assessment ... · 2018-05-23 · Fairness in assessment • Measurement invariance is cast as an issue of

• DIF• What are the items?• What is the profile of students who are identified with disabilities as preschoolers? Set

this up in the beginning or the end?• Fine motor • Activity level• Two stories– most items show no DIF. • The items that do show DIF are reflecting known issues.• Examining the comorbidity of language disorders and ADHD (50% of those id’d are SLI –

large comorbidity), 25% are reported as having a developmental delay– fine motor and activity level being more

• Pre-Elementary Education Longitudinal Study (PEELS)