48
Stephen G. Sireci University of Massachusetts Amherst Sireci Psychometric Services, Inc. Presentation at the National Conference on Student Assessment as part of the symposium, Validating Claims of College and Career Readiness (T. Patelis, Chair), June 23, 2015, San Diego, CA Are We Ready for College and Career Readiness?

Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Stephen G. Sireci University of Massachusetts Amherst

Sireci Psychometric Services, Inc. Presentation at the National Conference on

Student Assessment as part of the symposium, Validating Claims of College and Career Readiness (T. Patelis,

Chair), June 23, 2015, San Diego, CA

Are We Ready for College and Career Readiness?

Page 2: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

College and Career Readiness Standards

! An integral part of Common Core, other state standards, and education reform initiatives

! Many examples – PARCC, SMARTER Balanced, MO, TX,

VA ! Not to mention norm-referenced

tests such as – ACT, SAT, PSAT, Compass, Explore,

Plan

Page 3: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

College and Career Readiness Standards

Many examples (continued) ! And don’t forget criterion-referenced

tests – AP

! And criteria based on multiple indicators – College Board (2010) – NCES (2007)

The next slide, prepared with assistance from Duy Pham, shows some current examples.

Page 4: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Test-Based College Readiness Benchmarks

Page 5: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

But what do these benchmarks mean? ! And how are they set? ! And are they

– Accurate? – Useful? – Valid?

Page 6: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Mattern et al. (2014)

“Differences among the various definitions of CCR often lead to different conclusions about whether a student is ready for college and a career...These definitions vary on many factors, such as the indicators included, the operational definition of college success…and the model used for reporting CCR” (pp. 3-4).

Page 7: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Purposes of this presentation 1.  Discuss methods used for setting

CCR benchmarks 2.  Discuss research design issues in

setting CCR benchmarks 3.  Discuss frameworks for validating

CCR benchmarks 4.  Provide recommendations for how

College readiness benchmarks should be set, interpreted (described), and reported.

Page 8: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

What methods could be used to set CCR benchmarks? ! “Qualitative”—standard setting

panels (i.e., test-based standard setting).

! “Quantitative”—predictive or concurrent validity (linking, concordance) studies

Page 9: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Quantitative Research Designs ! Concurrent: Students take CCR

assessment and external assessments (or courses) around same time

! Predictive: Students’ college GPA or other criteria gathered later (retrospective analysis)

! Linking studies: CCR items embedded in external assessments and/or vice-versa

! Projection: Map cut-score from external assessment onto CCR test scale using population and sampling assumptions

Page 10: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Quantitative Research Designs !  Concurrent: Students take CCR assessment and

external assessments (or courses) around same time

!  Predictive: Students’ college GPA or other criteria gathered later (retrospective analysis)

!  Linking studies: CCR items embedded in external assessments and/or vice-versa

!  Projection: Map cut-score from external assessment onto CCR test scale using population and sampling assumptions

All these methods have been used and are in the literature.

Thanos and Karla will talk about some today!

Note all methods use an external criterion (e.g., test score, grades)

Page 11: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Issues in Setting CCR Standards Using External Criteria ! Defining “Success” ! Finding relevant external criteria ! Validating external criteria ! Deciding on research design(s) ! Defining probability of success

criterion

Page 12: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Defining Success in College

! First-year GPA? --Harvard GPA=UMASS GPA? --Pre-med GPA=Psychology GPA?

Page 13: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

How Should we Define “Success” in College ! First-year GPA?

! GPA in specific courses? ! Course completion? ! Number of credits? ! Graduation? ! Persistence?

--Harvard GPA=UMASS GPA? --Pre-med GPA=Psychology GPA?

Page 14: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Criteria for “success” Previous Research

! ACT: 75% chance of a “C” or 50% chance of a “B” in specific, entry-level courses

! SAT: 65% chance of a “B-” ! AP: Score of 3 (see Allen & Sconing, 2005; Camara,

2013; Wyatt et al., 2013) – “Evidence-based standard setting:

Logistic regression, regression, equipercentile equating, and other methods can be used

Page 15: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Other limitations of Quantitative Approaches ! Content overlap

– Different tests measure different skills (constructs), different purposes

! Differential motivation ! Different students (self-selection) ! Different time periods

–  (students change over time between testings)

Page 16: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend
Page 17: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend
Page 18: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

How Should College Readiness Standards be Validated? Two sources are particularly

relevant:

1.  Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014)

2.  Kane (1994) Validating the performance standards associated with passing scores (see also Kane 2001)

Page 19: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

The AERA et al. (2014) Standards define validity as,

“Validity refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests” (p. 11).

Several testing programs have prediction of CCR as an explicit use (purpose).

Page 20: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

A “proposed use” for Smarter Balanced “a primary goal of Smarter Balanced

is that colleges and universities use student performance on the assessment system as evidence of readiness for college.

Specifically, a test score that results in achievement levels 3 or 4 will be evidence that the student is ready for credit-bearing coursework…”

(Smarter Balanced, 2012)

Page 21: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Standards’ Validation Framework 5 Sources of Validity evidence: 1.  Test content 2.  Response processes 3.  Internal structure 4.  Relations to other variables 5.  Testing consequences

Page 22: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Standards’ Validation Framework Sources of Validity evidence to validate college readiness standards:

1.  Test content 2.  3.  Internal structure 4.  Relations to other variables 5.  Testing consequences

Page 23: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Standards’ Sources of Validity evidence to validate college readiness standards: 1.  Test content Demonstrating relevance of measured

knowledge and skills to success in college is a fundamental requirement.

–  Alignment studies –  Surveys (e.g., Conley et al., 2011)

Page 24: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Standards’ Sources of Validity evidence to validate college readiness standards: 3.  Internal structure Evidence is needed that students’

readiness classifications are reliable

–  Decision accuracy estimates –  Decision consistency estimates

Page 25: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Standards’ Sources of Validity evidence to validate college readiness standards:

4.  Relations to other variables Most research is in this area (e.g.,

previous table). Typically uses another test or grades

as validation criteria. Valid to the extent external criteria are

valid and threats to (internal) validity (e.g., sampling) are controlled.

Page 26: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Standards’ Sources of Validity evidence to validate college readiness standards:

5.  Testing consequences Do readiness benchmarks promote

success in college or provide a barrier?

Adverse impact? Dropout? More prepared students? Improvements over time?

Page 27: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

The “other” framework for evaluating CCR benchmarks ! Comes from the standard setting

literature, because after all, setting CCR

benchmarks on tests is standard setting.

Page 28: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Kane’s (1994, 2001) Framework for Evaluating Standard Setting Studies ! 3 General sources of validity

evidence – Procedural –  Internal – External

Page 29: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Kane’s (1994, 2001) Framework for Evaluating Standard Setting Studies ! Important to note that one

source of evidence is not enough. – Evaluation involves critiquing all

sources of evidence. – Is there sufficient evidence to

conclude cut scores are reasonable and defensible?

– Are any fatal flaws identified?

Page 30: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Procedural validity evidence “…the appropriateness of the

procedures used and the quality of the implementation of these procedures” (Kane, 1994, p. 437). – Justification of stand. setting method – Selection of panelists – Training panelists – Clarity of goals, tasks –  Implementation of method – documentation

Page 31: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Internal validity evidence ! Standard errors of cut scores ! Variability across panelists

– Subgroups of panelists (independent panels, types of panelists)

! Variability across rounds ! Variability across item formats ! Consistency of panelists’

predictions with borderline students’ performance.

Page 32: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

External Validity Evidence

! Degree to which classifications of examinees are consistent with other (external, independent) data. – Convergent or predictive validity data

! Classification consistency – Consistency across different standard

setting methods – Consistency w/ respect to validity

criterion

Sound similar? External benchmarking!

Page 33: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

How can external data be used? 1.  To set the readiness standards:

–  Readiness standard set using regression, projection, or some other statistical means

2.  Inform standard setting: –  Results used as reference points along

score scale to suggest where standards might best be placed (“neighborhoods”)

3.  Evaluate readiness standard after the fact (validation) –  Similar to other readiness benchmarks?

How successful were students deemed “ready?”

Page 34: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Option 2: Informing Standard Setting Using External Data

“Texas” model –  Policy group reviews results from

studies –  Sets up recommended

“neighborhoods” where standards are most reasonable

–  Standard setting panelists set standards, can go outside neighborhood, but need to have a good reason

Page 35: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Example of Informing Standard Setting (TX)

Smarter Balanced Math Score Scale

CA EAP Math readiness score AP

Calculus score of 3

SAT Readiness

ACT Readiness

Chance score

Neighborhood

OR Math graduation test passing score

GED Math passing score

Page 36: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Informing Standard Setting Using External Data (2) ! Neighborhood approach (constrain

standard setting, AKA “policy capturing”)

! Provide data to standard setting panelists (“briefing booklet” Haertel, 2012, “evidence-based standard setting” Beimers et al., 2012)

Page 37: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Option 3: Validating Standards

! After readiness standards have been set, can – Evaluate how well students who met

standard do in college – Compare standard to other measures

of readiness •  Put on same scale •  Cross-tabulate to evaluate classification

consistency

Page 38: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Discussion

! We know there are limitations to any method for setting CCR benchmarks

! We also know there are some fundamental requirements that should be in place for CCR benchmarks to be defensible.

Page 39: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Five Requirements for Valid CCR Benchmarks

1.  Validity evidence based on test content: –  content of assessments should reflect

academic aspects of CCR 2.  Validity evidence based on relations

to other variables –  Students’ test scores should be

positively related to other measures of readiness

Page 40: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Content validity (alignment) requirement… “…Statistical validation is not an

alternative to subjective evaluation, but an extension of it. All statistical procedures for validating tests are based ultimately upon common sense agreement concerning what is being measured by a particular measurement process” (Ebel, 1956, p. 274-275).

Page 41: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Requirements for Valid CCR Benchmarks (continued) 3.  Validity evidence based on testing

consequences –  Evidence that CCR benchmarks are

having intended effects –  And are not presenting a barrier to

students who may otherwise be successful in college or career.

4.  Student CCR classifications should be reliable (consistent)

Page 42: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Requirements for Valid CCR Benchmarks (3) 5.  Standard setting (benchmark

setting) process should demonstrate procedural and internal validity according to Kane’s (1994, 2001) criteria. – Note external validity covered in

requirement #2.

Page 43: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

So, how should we set CCR readiness benchmarks? ! Or should we?

Setting CCR benchmarks requires a comprehensive research agenda to set and evaluate the benchmarks.

External data are currently popular, but should not be the sole determinant in setting benchmark.

Page 44: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Interpreting and Reporting CCR benchmarks ! Dichotomous approach: Ready/Not

Ready ! “Leveled” Approach

– No Recommendation, Possibly Qualified, Qualified, Extremely Qualified

! Probabilistic Approach – Probability of success in college

between .65 and .75

Page 45: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

How should we report readiness? ! I might not have the best solution,

but I know what we should NOT do…

Let psychometricians make the decisions.

Focus groups with key stakeholders, including students, and psychologists, should be used.

Page 46: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

In reporting and interpreting CCR “scores”

! It is important to avoid – deterministic language – self-fulfilling prophesies –  implying we have more precise

information that we actually have. ! We need to do a lot more research on

how to report CCR information. ! My opinion: Avoid reporting CCR

below high school. Stick to within-grade/adjacent grade achievement expectations

Page 47: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Conclusions ! By using the validation frameworks

provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend the validity of CCR benchmarks (if warranted!).

! By developing a research agenda around interpreting and reporting CCR scores, we can avoid negative consequences.

Page 48: Are We Ready for College and Career Readiness? · provided by the AERA et al. Standards, and by Kane (1994, 2001), we can gather, analyze, and report the evidence we need to defend

Thanks to Dr. Patelis for the invitation!

And to you for your attention.

Questions or Comments

[email protected]