Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Stephen G. Sireci University of Massachusetts Amherst

Sireci Psychometric Services, Inc. Presentation at the National Conference on

Student Assessment as part of the symposium, Validating Claims of College and Career Readiness (T. Patelis,

Chair), June 23, 2015, San Diego, CA

Are We Ready for College and Career Readiness?

College and Career Readiness Standards

! An integral part of Common Core, other state standards, and education reform initiatives

! Many examples – PARCC, SMARTER Balanced, MO, TX,

VA ! Not to mention norm-referenced

tests such as – ACT, SAT, PSAT, Compass, Explore,

Plan

College and Career Readiness Standards

Many examples (continued) ! And don’t forget criterion-referenced

tests – AP

! And criteria based on multiple indicators – College Board (2010) – NCES (2007)

The next slide, prepared with assistance from Duy Pham, shows some current examples.

Test-Based College Readiness Benchmarks

But what do these benchmarks mean? ! And how are they set? ! And are they

– Accurate? – Useful? – Valid?

Mattern et al. (2014)

“Differences among the various definitions of CCR often lead to different conclusions about whether a student is ready for college and a career...These definitions vary on many factors, such as the indicators included, the operational definition of college success…and the model used for reporting CCR” (pp. 3-4).

Purposes of this presentation 1.  Discuss methods used for setting

CCR benchmarks 2.  Discuss research design issues in

setting CCR benchmarks 3.  Discuss frameworks for validating

CCR benchmarks 4.  Provide recommendations for how

College readiness benchmarks should be set, interpreted (described), and reported.

What methods could be used to set CCR benchmarks? ! “Qualitative”—standard setting

panels (i.e., test-based standard setting).

! “Quantitative”—predictive or concurrent validity (linking, concordance) studies

Quantitative Research Designs ! Concurrent: Students take CCR

assessment and external assessments (or courses) around same time

! Predictive: Students’ college GPA or other criteria gathered later (retrospective analysis)

! Linking studies: CCR items embedded in external assessments and/or vice-versa

! Projection: Map cut-score from external assessment onto CCR test scale using population and sampling assumptions

Quantitative Research Designs !  Concurrent: Students take CCR assessment and

external assessments (or courses) around same time

!  Predictive: Students’ college GPA or other criteria gathered later (retrospective analysis)

!  Linking studies: CCR items embedded in external assessments and/or vice-versa

!  Projection: Map cut-score from external assessment onto CCR test scale using population and sampling assumptions

All these methods have been used and are in the literature.

Thanos and Karla will talk about some today!

Note all methods use an external criterion (e.g., test score, grades)

Issues in Setting CCR Standards Using External Criteria ! Defining “Success” ! Finding relevant external criteria ! Validating external criteria ! Deciding on research design(s) ! Defining probability of success

criterion

Defining Success in College

! First-year GPA? --Harvard GPA=UMASS GPA? --Pre-med GPA=Psychology GPA?

How Should we Define “Success” in College ! First-year GPA?

! GPA in specific courses? ! Course completion? ! Number of credits? ! Graduation? ! Persistence?

--Harvard GPA=UMASS GPA? --Pre-med GPA=Psychology GPA?

Criteria for “success” Previous Research

! ACT: 75% chance of a “C” or 50% chance of a “B” in specific, entry-level courses

! SAT: 65% chance of a “B-” ! AP: Score of 3 (see Allen & Sconing, 2005; Camara,

2013; Wyatt et al., 2013) – “Evidence-based standard setting:

Logistic regression, regression, equipercentile equating, and other methods can be used

Other limitations of Quantitative Approaches ! Content overlap

– Different tests measure different skills (constructs), different purposes

! Differential motivation ! Different students (self-selection) ! Different time periods

–  (students change over time between testings)

How Should College Readiness Standards be Validated? Two sources are particularly

relevant:

1.  Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014)

2.  Kane (1994) Validating the performance standards associated with passing scores (see also Kane 2001)

The AERA et al. (2014) Standards define validity as,

“Validity refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests” (p. 11).

Several testing programs have prediction of CCR as an explicit use (purpose).

A “proposed use” for Smarter Balanced “a primary goal of Smarter Balanced

is that colleges and universities use student performance on the assessment system as evidence of readiness for college.

Specifically, a test score that results in achievement levels 3 or 4 will be evidence that the student is ready for credit-bearing coursework…”

(Smarter Balanced, 2012)

Standards’ Validation Framework 5 Sources of Validity evidence: 1.  Test content 2.  Response processes 3.  Internal structure 4.  Relations to other variables 5.  Testing consequences

Standards’ Validation Framework Sources of Validity evidence to validate college readiness standards:

1.  Test content 2.  3.  Internal structure 4.  Relations to other variables 5.  Testing consequences

Standards’ Sources of Validity evidence to validate college readiness standards: 1.  Test content Demonstrating relevance of measured

knowledge and skills to success in college is a fundamental requirement.

–  Alignment studies –  Surveys (e.g., Conley et al., 2011)

Standards’ Sources of Validity evidence to validate college readiness standards: 3.  Internal structure Evidence is needed that students’

readiness classifications are reliable

–  Decision accuracy estimates –  Decision consistency estimates

Standards’ Sources of Validity evidence to validate college readiness standards:

4.  Relations to other variables Most research is in this area (e.g.,

previous table). Typically uses another test or grades

as validation criteria. Valid to the extent external criteria are

valid and threats to (internal) validity (e.g., sampling) are controlled.

Standards’ Sources of Validity evidence to validate college readiness standards:

5.  Testing consequences Do readiness benchmarks promote

success in college or provide a barrier?

Adverse impact? Dropout? More prepared students? Improvements over time?

The “other” framework for evaluating CCR benchmarks ! Comes from the standard setting

literature, because after all, setting CCR

benchmarks on tests is standard setting.

Kane’s (1994, 2001) Framework for Evaluating Standard Setting Studies ! 3 General sources of validity

evidence – Procedural –  Internal – External

Kane’s (1994, 2001) Framework for Evaluating Standard Setting Studies ! Important to note that one

source of evidence is not enough. – Evaluation involves critiquing all

sources of evidence. – Is there sufficient evidence to

conclude cut scores are reasonable and defensible?

– Are any fatal flaws identified?

Procedural validity evidence “…the appropriateness of the

procedures used and the quality of the implementation of these procedures” (Kane, 1994, p. 437). – Justification of stand. setting method – Selection of panelists – Training panelists – Clarity of goals, tasks –  Implementation of method – documentation

Internal validity evidence ! Standard errors of cut scores ! Variability across panelists

– Subgroups of panelists (independent panels, types of panelists)

! Variability across rounds ! Variability across item formats ! Consistency of panelists’

predictions with borderline students’ performance.

External Validity Evidence

! Degree to which classifications of examinees are consistent with other (external, independent) data. – Convergent or predictive validity data

! Classification consistency – Consistency across different standard

setting methods – Consistency w/ respect to validity

criterion

Sound similar? External benchmarking!

How can external data be used? 1.  To set the readiness standards:

–  Readiness standard set using regression, projection, or some other statistical means

2.  Inform standard setting: –  Results used as reference points along

score scale to suggest where standards might best be placed (“neighborhoods”)

3.  Evaluate readiness standard after the fact (validation) –  Similar to other readiness benchmarks?

How successful were students deemed “ready?”

Option 2: Informing Standard Setting Using External Data

“Texas” model –  Policy group reviews results from

studies –  Sets up recommended

“neighborhoods” where standards are most reasonable

–  Standard setting panelists set standards, can go outside neighborhood, but need to have a good reason

Example of Informing Standard Setting (TX)

Smarter Balanced Math Score Scale

CA EAP Math readiness score AP

Calculus score of 3

SAT Readiness

ACT Readiness

Chance score

Neighborhood

OR Math graduation test passing score

GED Math passing score

Informing Standard Setting Using External Data (2) ! Neighborhood approach (constrain

standard setting, AKA “policy capturing”)

! Provide data to standard setting panelists (“briefing booklet” Haertel, 2012, “evidence-based standard setting” Beimers et al., 2012)

Option 3: Validating Standards

! After readiness standards have been set, can – Evaluate how well students who met

standard do in college – Compare standard to other measures

of readiness •  Put on same scale •  Cross-tabulate to evaluate classification

consistency

Discussion

! We know there are limitations to any method for setting CCR benchmarks

! We also know there are some fundamental requirements that should be in place for CCR benchmarks to be defensible.

Five Requirements for Valid CCR Benchmarks

1.  Validity evidence based on test content: –  content of assessments should reflect

academic aspects of CCR 2.  Validity evidence based on relations

to other variables –  Students’ test scores should be

positively related to other measures of readiness

Content validity (alignment) requirement… “…Statistical validation is not an

alternative to subjective evaluation, but an extension of it. All statistical procedures for validating tests are based ultimately upon common sense agreement concerning what is being measured by a particular measurement process” (Ebel, 1956, p. 274-275).

Requirements for Valid CCR Benchmarks (continued) 3.  Validity evidence based on testing

consequences –  Evidence that CCR benchmarks are

having intended effects –  And are not presenting a barrier to

students who may otherwise be successful in college or career.

4.  Student CCR classifications should be reliable (consistent)

Requirements for Valid CCR Benchmarks (3) 5.  Standard setting (benchmark

setting) process should demonstrate procedural and internal validity according to Kane’s (1994, 2001) criteria. – Note external validity covered in

requirement #2.

So, how should we set CCR readiness benchmarks? ! Or should we?

Setting CCR benchmarks requires a comprehensive research agenda to set and evaluate the benchmarks.

External data are currently popular, but should not be the sole determinant in setting benchmark.

Interpreting and Reporting CCR benchmarks ! Dichotomous approach: Ready/Not

Ready ! “Leveled” Approach

– No Recommendation, Possibly Qualified, Qualified, Extremely Qualified

! Probabilistic Approach – Probability of success in college

between .65 and .75

How should we report readiness? ! I might not have the best solution,

but I know what we should NOT do…

Let psychometricians make the decisions.

Focus groups with key stakeholders, including students, and psychologists, should be used.

In reporting and interpreting CCR “scores”

! It is important to avoid – deterministic language – self-fulfilling prophesies –  implying we have more precise

information that we actually have. ! We need to do a lot more research on

how to report CCR information. ! My opinion: Avoid reporting CCR

below high school. Stick to within-grade/adjacent grade achievement expectations

Conclusions ! By using the validation frameworks

provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend the validity of CCR benchmarks (if warranted!).

! By developing a research agenda around interpreting and reporting CCR scores, we can avoid negative consequences.

Thanks to Dr. Patelis for the invitation!

And to you for your attention.

Questions or Comments

[email protected]

Documents

Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend