Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Stephen G. Sireci University of Massachusetts Amherst
Sireci Psychometric Services, Inc. Presentation at the National Conference on
Student Assessment as part of the symposium, Validating Claims of College and Career Readiness (T. Patelis,
Chair), June 23, 2015, San Diego, CA
Are We Ready for College and Career Readiness?
College and Career Readiness Standards
! An integral part of Common Core, other state standards, and education reform initiatives
! Many examples – PARCC, SMARTER Balanced, MO, TX,
VA ! Not to mention norm-referenced
tests such as – ACT, SAT, PSAT, Compass, Explore,
Plan
College and Career Readiness Standards
Many examples (continued) ! And don’t forget criterion-referenced
tests – AP
! And criteria based on multiple indicators – College Board (2010) – NCES (2007)
The next slide, prepared with assistance from Duy Pham, shows some current examples.
Test-Based College Readiness Benchmarks
But what do these benchmarks mean? ! And how are they set? ! And are they
– Accurate? – Useful? – Valid?
Mattern et al. (2014)
“Differences among the various definitions of CCR often lead to different conclusions about whether a student is ready for college and a career...These definitions vary on many factors, such as the indicators included, the operational definition of college success…and the model used for reporting CCR” (pp. 3-4).
Purposes of this presentation 1. Discuss methods used for setting
CCR benchmarks 2. Discuss research design issues in
setting CCR benchmarks 3. Discuss frameworks for validating
CCR benchmarks 4. Provide recommendations for how
College readiness benchmarks should be set, interpreted (described), and reported.
What methods could be used to set CCR benchmarks? ! “Qualitative”—standard setting
panels (i.e., test-based standard setting).
! “Quantitative”—predictive or concurrent validity (linking, concordance) studies
Quantitative Research Designs ! Concurrent: Students take CCR
assessment and external assessments (or courses) around same time
! Predictive: Students’ college GPA or other criteria gathered later (retrospective analysis)
! Linking studies: CCR items embedded in external assessments and/or vice-versa
! Projection: Map cut-score from external assessment onto CCR test scale using population and sampling assumptions
Quantitative Research Designs ! Concurrent: Students take CCR assessment and
external assessments (or courses) around same time
! Predictive: Students’ college GPA or other criteria gathered later (retrospective analysis)
! Linking studies: CCR items embedded in external assessments and/or vice-versa
! Projection: Map cut-score from external assessment onto CCR test scale using population and sampling assumptions
All these methods have been used and are in the literature.
Thanos and Karla will talk about some today!
Note all methods use an external criterion (e.g., test score, grades)
Issues in Setting CCR Standards Using External Criteria ! Defining “Success” ! Finding relevant external criteria ! Validating external criteria ! Deciding on research design(s) ! Defining probability of success
criterion
Defining Success in College
! First-year GPA? --Harvard GPA=UMASS GPA? --Pre-med GPA=Psychology GPA?
How Should we Define “Success” in College ! First-year GPA?
! GPA in specific courses? ! Course completion? ! Number of credits? ! Graduation? ! Persistence?
--Harvard GPA=UMASS GPA? --Pre-med GPA=Psychology GPA?
Criteria for “success” Previous Research
! ACT: 75% chance of a “C” or 50% chance of a “B” in specific, entry-level courses
! SAT: 65% chance of a “B-” ! AP: Score of 3 (see Allen & Sconing, 2005; Camara,
2013; Wyatt et al., 2013) – “Evidence-based standard setting:
Logistic regression, regression, equipercentile equating, and other methods can be used
Other limitations of Quantitative Approaches ! Content overlap
– Different tests measure different skills (constructs), different purposes
! Differential motivation ! Different students (self-selection) ! Different time periods
– (students change over time between testings)
How Should College Readiness Standards be Validated? Two sources are particularly
relevant:
1. Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014)
2. Kane (1994) Validating the performance standards associated with passing scores (see also Kane 2001)
The AERA et al. (2014) Standards define validity as,
“Validity refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests” (p. 11).
Several testing programs have prediction of CCR as an explicit use (purpose).
A “proposed use” for Smarter Balanced “a primary goal of Smarter Balanced
is that colleges and universities use student performance on the assessment system as evidence of readiness for college.
Specifically, a test score that results in achievement levels 3 or 4 will be evidence that the student is ready for credit-bearing coursework…”
(Smarter Balanced, 2012)
Standards’ Validation Framework 5 Sources of Validity evidence: 1. Test content 2. Response processes 3. Internal structure 4. Relations to other variables 5. Testing consequences
Standards’ Validation Framework Sources of Validity evidence to validate college readiness standards:
1. Test content 2. 3. Internal structure 4. Relations to other variables 5. Testing consequences
Standards’ Sources of Validity evidence to validate college readiness standards: 1. Test content Demonstrating relevance of measured
knowledge and skills to success in college is a fundamental requirement.
– Alignment studies – Surveys (e.g., Conley et al., 2011)
Standards’ Sources of Validity evidence to validate college readiness standards: 3. Internal structure Evidence is needed that students’
readiness classifications are reliable
– Decision accuracy estimates – Decision consistency estimates
Standards’ Sources of Validity evidence to validate college readiness standards:
4. Relations to other variables Most research is in this area (e.g.,
previous table). Typically uses another test or grades
as validation criteria. Valid to the extent external criteria are
valid and threats to (internal) validity (e.g., sampling) are controlled.
Standards’ Sources of Validity evidence to validate college readiness standards:
5. Testing consequences Do readiness benchmarks promote
success in college or provide a barrier?
Adverse impact? Dropout? More prepared students? Improvements over time?
The “other” framework for evaluating CCR benchmarks ! Comes from the standard setting
literature, because after all, setting CCR
benchmarks on tests is standard setting.
Kane’s (1994, 2001) Framework for Evaluating Standard Setting Studies ! 3 General sources of validity
evidence – Procedural – Internal – External
Kane’s (1994, 2001) Framework for Evaluating Standard Setting Studies ! Important to note that one
source of evidence is not enough. – Evaluation involves critiquing all
sources of evidence. – Is there sufficient evidence to
conclude cut scores are reasonable and defensible?
– Are any fatal flaws identified?
Procedural validity evidence “…the appropriateness of the
procedures used and the quality of the implementation of these procedures” (Kane, 1994, p. 437). – Justification of stand. setting method – Selection of panelists – Training panelists – Clarity of goals, tasks – Implementation of method – documentation
Internal validity evidence ! Standard errors of cut scores ! Variability across panelists
– Subgroups of panelists (independent panels, types of panelists)
! Variability across rounds ! Variability across item formats ! Consistency of panelists’
predictions with borderline students’ performance.
External Validity Evidence
! Degree to which classifications of examinees are consistent with other (external, independent) data. – Convergent or predictive validity data
! Classification consistency – Consistency across different standard
setting methods – Consistency w/ respect to validity
criterion
Sound similar? External benchmarking!
How can external data be used? 1. To set the readiness standards:
– Readiness standard set using regression, projection, or some other statistical means
2. Inform standard setting: – Results used as reference points along
score scale to suggest where standards might best be placed (“neighborhoods”)
3. Evaluate readiness standard after the fact (validation) – Similar to other readiness benchmarks?
How successful were students deemed “ready?”
Option 2: Informing Standard Setting Using External Data
“Texas” model – Policy group reviews results from
studies – Sets up recommended
“neighborhoods” where standards are most reasonable
– Standard setting panelists set standards, can go outside neighborhood, but need to have a good reason
Example of Informing Standard Setting (TX)
Smarter Balanced Math Score Scale
CA EAP Math readiness score AP
Calculus score of 3
SAT Readiness
ACT Readiness
Chance score
Neighborhood
OR Math graduation test passing score
GED Math passing score
Informing Standard Setting Using External Data (2) ! Neighborhood approach (constrain
standard setting, AKA “policy capturing”)
! Provide data to standard setting panelists (“briefing booklet” Haertel, 2012, “evidence-based standard setting” Beimers et al., 2012)
Option 3: Validating Standards
! After readiness standards have been set, can – Evaluate how well students who met
standard do in college – Compare standard to other measures
of readiness • Put on same scale • Cross-tabulate to evaluate classification
consistency
Discussion
! We know there are limitations to any method for setting CCR benchmarks
! We also know there are some fundamental requirements that should be in place for CCR benchmarks to be defensible.
Five Requirements for Valid CCR Benchmarks
1. Validity evidence based on test content: – content of assessments should reflect
academic aspects of CCR 2. Validity evidence based on relations
to other variables – Students’ test scores should be
positively related to other measures of readiness
Content validity (alignment) requirement… “…Statistical validation is not an
alternative to subjective evaluation, but an extension of it. All statistical procedures for validating tests are based ultimately upon common sense agreement concerning what is being measured by a particular measurement process” (Ebel, 1956, p. 274-275).
Requirements for Valid CCR Benchmarks (continued) 3. Validity evidence based on testing
consequences – Evidence that CCR benchmarks are
having intended effects – And are not presenting a barrier to
students who may otherwise be successful in college or career.
4. Student CCR classifications should be reliable (consistent)
Requirements for Valid CCR Benchmarks (3) 5. Standard setting (benchmark
setting) process should demonstrate procedural and internal validity according to Kane’s (1994, 2001) criteria. – Note external validity covered in
requirement #2.
So, how should we set CCR readiness benchmarks? ! Or should we?
Setting CCR benchmarks requires a comprehensive research agenda to set and evaluate the benchmarks.
External data are currently popular, but should not be the sole determinant in setting benchmark.
Interpreting and Reporting CCR benchmarks ! Dichotomous approach: Ready/Not
Ready ! “Leveled” Approach
– No Recommendation, Possibly Qualified, Qualified, Extremely Qualified
! Probabilistic Approach – Probability of success in college
between .65 and .75
How should we report readiness? ! I might not have the best solution,
but I know what we should NOT do…
Let psychometricians make the decisions.
Focus groups with key stakeholders, including students, and psychologists, should be used.
In reporting and interpreting CCR “scores”
! It is important to avoid – deterministic language – self-fulfilling prophesies – implying we have more precise
information that we actually have. ! We need to do a lot more research on
how to report CCR information. ! My opinion: Avoid reporting CCR
below high school. Stick to within-grade/adjacent grade achievement expectations
Conclusions ! By using the validation frameworks
provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend the validity of CCR benchmarks (if warranted!).
! By developing a research agenda around interpreting and reporting CCR scores, we can avoid negative consequences.
Thanks to Dr. Patelis for the invitation!
And to you for your attention.
Questions or Comments